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T-DNA promoters of the Ri plasmld. 



© The sequence of the T L -DNA of Ri plasmids found in 
Agrobacterium rhaogenes strains HRI and A4 is disclosed. 
Sixteen open reading frames bounded by eukaryotic promo- 
ters, ribosome binding sites, and polyadenylation sites were 
found, five of which were observed to be transcripted in a 
developmentally and phenotypically regulated manner. The 
use of promoters and polyadenylation sites from pRi T L -DNA 
to control expression of heterologous foreign structural 
genes is taught, using as examples the structural genes for 
Phaseolus vulgaris storage protein (phaseolin), P. vulgaris 
lectin, a sweet protein (thaumatin), and Bacillus thuringiensis 
crystal protein. Vectors useful for manipulation of sequences 
^ of the structural genes and T-DNA are also provided 
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TITLE MODIFIED 

^ r ° nt P^Ga R1 T-DNA PROMOTERS 



FIELD 

The present invention 1s in the fields of genetic engineering and 
plant husbandry, and especially provides means for promotion of transcrip- 
tion in plant. 

BACKGROUND 

Following are publications which disclose background information 
related to the present invention. These publications are discussed in 
greater depth in the Background sections indicated. Restriction maps of 
Ri plasmids are disclosed by 6. A. Huffman et^]_. (1984) J. Bacterid. 
152:269-276; I. Jouanin (1984) Plasmid _12:91-102; and H. Pomponi et al_. 
(1983) Plasmid 10:119-129 (see TIP Plasmid DNA). L. Herrera-Estrella 
et al_. (1983) Nature 303:209-213, provides examples of use of the nos 
promoter to drive expression in plants of heterologous foreign structural 
genes. N. Murai et a±. (1983) Science 222:476-482, reported the ocs pro- 
moter could drive expression of an intron-containi ng fusion gene having 
foreign coding sequences. (Manipulations of the TIP Plasmids). R. F. 
Barker et.aj_. (1983) Plant Molec. Biol. 2:335-350, and R. F. Barker and 
J. D. Kemp, U.S. Patent application ser. no. 553,786 disclose the complete 
sequence of the T-DNA from the octopi ne-type plasmid P Til5955; homologous 
published sequences of other Ti plasmid genes are referenced therein. 
Barker and Kemp also taught use of various octopi ne T-DNA promoters to 
drive expression in plants of various structural genes (Genes on the TIP 
Plasmids). 

Shuttle Vectors 

Shuttle vectors, developed by G. B. Ruvkun and F. M. Ausubel (1981) 
• Nature 289 :85-88. which provide means for inserting foreign genetic 
material into large DNA molecules, include copies of recipient genome DNA 
sequences into which the foreign genetic material is inserted. Shuttle 
vectors can bo introduced a recipient cell, by well known methods, inclu- 
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ding the tri-parental mating technique (Ruvkln and Ausubel, supra ), direct 
transfer of a self-mobil izable vector 1n a bi-parental mating, direct 
uptake of exogenous DNA by Agrobacterium cells ("transformation' 1 ) , sphero- 
plast fusion of Agrobacterium with another bacterial cell, uptake of lipo- 
5 some-encapuslated DNA. After a shuttle vector 1s Introduced Into a 
recipient cell, possible events Include a double cross-over with one 
recombinational event on either side of the marker (homogenotization). 
Phenotypically dominant traits may be introduced by single cross-over 
events (coi ntegration) (A. Caplan et al_. (1983) Science 222 :815-821; R. B. 
10 Horsch et ej_. (1984) Science 223 :496-498); one must guard against deletion 
of the resulting tandem duplication. Shuttle vectors have proved useful 
in manipulation of Agrobacterium plasmids. 

"Suicide Vectors" {e.g. R. Simon _et ja]_. (1983) Biotechnol . jk784- 
791), are shuttle vectors having replicons not independently maintainable 

IS within the recipient cell. Use of suicide vectors to transfer DNA 

sequences into a Ti plasmid has been reported (e.g. E. Van Haute et- al ... 
(1983) EMBO J. 2:411-417; L. Comai^et al_. (1983) Plasmid _10:21-30; 
P. Zambryski et . *\. (1983) EMBO J. 2:2143-2150; P. Zambryski et__al_. (1984) 
in Genetic Engineering, Principles, and Methods , 6, eds: A. Hollaender 

20 and J. Setlow; P. Zahn et_ Al- (1984) Mol. Gen. Genet. 194:188-194; and 
Caplan et al . , supra ; and C. H. Shaw et al_. (1983) Gene 28:315-330. 

Overview of Agrobacterium 

Included within the gram-negative genus Agrobacterium are the species 
25 _A. tumefaciens and _A. rhizogenes , respectively the causal agents of crown 
gall disease and hairy root disease of gymnosperm and dicotyledonous 
angiosperm plants. In both diseases, the inappropriately growing plant 
tisssue usually produces one or more amino acid derivatives known as 
opines which may be classified into families whose type members include 
30 octopine, nopaline, mannopine, and agropine. 

Virulent strains of Agrobacterium harbor large plasmids known as Ti 
(tumor-inducing) plasmids (pTi) in A. tumefaciens and Ri ( root- i nduci ng) 
plasmids in rhizogenes (pRi), often classified by the opine which they 
caused to be synthesized. Ti and Ri plasmids both contain DNA sequences, 
35 referred to as T-DNA (transf erred-DNA) , which in tumors are found to be 
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Integrated Into the genome of the host plant. Several T-DNA genes are 
under control of T-DNA promoters which resembles the canonical eukaryotic 
promoter 1n structure. The T1 plasmid also carries genes outside the 
T-DNA region. The set of genes and DNA sequences responsible for trans- 
forming the plant cell are hereinafter collectively referred to as the 
transformation-Inducing principle (TIP) . The term TIP therefore Includes, 
but 1s not limited to, both Ti and Ri plasmids. 

General reviews of Agrobacterium- caused disease Include those by 
D. J. Merlo (1982), Adv. Plant Pathol. 1:139-178; L. W. Ream and M. P. 
Gordon (1982), Science 218 :854-859; M. W. Bevan and M.-D. Chilton (1982), 
Ann. Rev. Genet. _16:357-384; G. Kahl and J. Schell (1982) Molecular 
Biology of Plant Tumors ; K. A. Barton and M.-D. Chilton (1983) Meth. 
Enzymol. 101 :527-539; A. Depicker j?t jaK (1983) in Genetic Engineering of 
Plants: an Agricultural Perspective , eds: T. Kosuge £t al_. , pp. 143-176; 
A. Cap.lan jt (1983) Science 222 :815-821 ; T. C. Hall_et_aK, European 
Patent application 126,546; and A. N. Binns (1984) Oxford Surveys Plant 
Kol. Cell Biol. J^: 130-160. A number of more specialized reviews can be 
found in A. Piahler, ed. (1983) Molecular Genetics of the Bacteria-Plant 
Interaction , including a treatment by D. Tepfer of _A. rhizogenes- mediated 
transformation (pp. 248-258). R. A. Schilperoort (1984) in Efficiency in 
Plant Breeding (Proc. 10th Congr. Eur. Assoc. Res. Plant Breeding), eds: 
W. Lange et al . , pp. 251-285, discusses the Agrobacterium -based plant 
transformation in the context of the art of plant genetic engineering and 
pi ant improvement. 

Infection of Plant Tissues 

Plant cells can be transformed by Agrobacterium by several methods 
known to the art. For a review of recent work, see K. Syono (1984) Oxford 
Surveys Plant MoT. Cell Biol. J_:217-219. In the present invention, any 
method will suffice as long as the gene is stably transmitted through 
mitosis and meiosis. 

The infection of plant tissue by Agrobacterium is a simple technique 
well known to those skilled in the art. Typically after being wounded, a 
plant is inoculated with a suspension of tumor-inducing bacteria. Alter- 
natively, tissue pieces are inoculated, e.g. leaf disks (R. B. Horsch 
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et el. (1985) Science 127:1229-1231) or Inverted stem segments (K. A. 
ITrton et al_. (1983) Cell .32:1033-1043). After induction, the tumors can 
be placed 1n tissue culture on media lacking phytohormones usually 
Included for culture of untransf ormed plant tissue. Traditional Inocula- 
tion and culture techniques may be modified for use of disarmed T-DNA 
vectors Incapable of Inducing hormone independent growth (e.g. see 
P. Zambryski et al_. (1984) in Genetic Engineering. Principles, and 
Methods , 6,, eds.: A. Hollaender and J. Setlow). 

Aqrobacterlum 1s also capable of infecting isolated cells, cells 
grown 1n culture, callus cells, and Isolated protoplasts (e.g. R. B. 
Horsch and R. T. Fraley (1983) 1n Advances in Gene Technology: Molecular 
Genetics of Plants and Animals (Miami Winter Symposium 20) , eds.: 
K. Downey _et _al_. , P- 576; R. T. Fraley et al_. (1984) Plant Mol. Biol. 
3:371-378; R. T. Fraley and R. B. Horsch (1983) in Genetic Engineering of 
15 plants: an Agricultural Perspective , eds.: T. Kosuge et _al_. , pp. 177- 
194; A. Muller et__al_. (1983) Biochem. Biophys. Res. Comm. 123 :458-462). 
The transformation frequency of inoculated callus pieces can be increased., 
by addition of an opine or opine precursors (L. M. Cello and W. L. 01 sen, ^ 
U.S. Patent 4,459,355). 
20 Plant protoplasts can be transformed by the direct uptake of TIP DNA 

in the presence of a polycation, polyethelene glycol, or both (e.g. F. A. - 
Krens _et _al_. (1982) Nature j>96 :72-74) , though integrated Ti plasmid may . 
include non-T-DNA sequences. 

An alternative method involves uptake of DNA surrounded by mem- 
25 branes. pTi-DNA may be introduced via liposomes or by fusion of plant and 
bacterial cells after removal of their respective cell walls (e.g. R. Hain 
et _a_K (1984) Plant Cell Rept. J3_:60-64). Plant protoplasts can take up 
cell wall delimited •Agrobacterium cells. T-DNA can be transmitted to 
tissue regenerated from fused protoplasts. 
30 The host range of crown gall pathogenesis may be influenced by T-DNA- 

encoded functions such as one genes (A- Hoekema _et_ jaK (1984) 
j. Bacterid. 158:383-385; A. Hoekema et _a]_. (1964) EKBO J. 3_:3043-3047 ; 
W. C. Buchholz and M. F. Thomasshow (1984) 160 :327-332). R. L. Ausich, 
European Patent Application 108,580, reports transfer of T-DNA from 
35 /\. tumefaciens. to green algal cells, and expression therein of octopine 
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synthase and Tn5 kanamydn resistance genes.' 6. M. S. Hooykaas- 
van Slogteren et _al_. (1984) Nature 3H.:763-764 . and O.-P. Hernalsteens 
et al. (1984) EMBO 0. 2:3039-3041. have demonstrated transformation of 
monocot cells by Agrobacterlum without the customary tumorl genesis. 

Regeneration of Plants 

Differentiated plant tissues with normal morphology have been 
obtained from crown gall tumors. For example, L. Otten et__al. (1981) 
Molec Gen. Genet. 183:209-213, used tms (shoot-1ndudng, root-suppressing) 
Ti plasmid mutants to create tumors which proliferated shoots that formed 
self-fertile flowers. The resultant seeds germinated into plants which 
contained T-DNA and made opines. The tms_ phenotype can be partly overcome 
by washing of the rooting area and can be bypassed by grafting onto a 
normal stock (A. Wostemeyer et _al_. .(1984) Mol. Gen. Genet. 194:500-507). 
Similar experiments with a tmr (root-inducing, shoot-suppressing) mutant 
showed that full-length T-DNA could be transmitted through meiosis to 
progeny and that in those progeny nopaline genes could be expressed, 
though at variable levels (K. A. Barton et^ al_. (1983) Cell 32:1033- 
1043). 

Genes involved in opine anabolism were capable of passing through 
meiosis, though the plants were male sterile if the T-DNA was not dis- 
armed. Seemingly unaltered T-DNA and functional foreign genes can be 
inherited in a dominant, closely linked, Mendelian fashion. Genetically, 
T-DNA genes are closely linked in regenerated plants (A. Wostemeyer et al. 
(1984) Mol. Gen. Genet. 19± :500-507; R. B. Horsch et al_. (1984) Science"" 
223:496-498; D. Tepfer (1984) Cel 1 2L :959-967 ) . 

The epigenetic state of the plant cells initially transformed can 
affect regeneration potential (€. M. S. van Slogteren et _al_. (1983) Plant 
Mol. Biol. 2:321-333). 

Roots resulting from transformation f rom A. rhizoqenes have proven 
relatively easy to regenerate directly into plantlets (M.-D. Chilton 
'et al. (1982) Nature 295:432-434 ; D. Tepfer (1984) Cel 1 21:959-957 ; Tepfer 

(1983) in Puhler, supra ), and are easily cloned. Regenerabil ity from 
transformed roots may be dependent on T-DNA copy-number (C. David et al. 

(1984) Biotechnol. 2:73-76). Hairy root regenerants have a rhizogeniT" 
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potential find isozyme pattern not found 1n unt ransformed plants 

(P. Costantlno et t±* (1984) J. Mol. Appl. Genet. 2^:465-470). The pheno- 

type of these plants 1s generally altered, although not necessarily dele- i 

teriously. ! 

5 Genes on the TIP Plasmids 

The complete sequence of the T-DNA of an octopine-type plasmid found 
in ATCC 15955, pT1l5955, has been reported (R. F. Barker et_ jaK (1983) 
Plant Mo Tec. Biol. 2^:335-350), as has that of the T L region of pT1Ach5 
(j. Gielen et aj_. (1984) EMBO J. 3_:835-846). Published T-DNA genes do not 
10 contain introns and do have sequences that resemble canonical eukaryotic 
promoter elements and polyadenyl ation sites. 

Ti plasmids having mutations in the genes tms , tmr , tml , and ocs 

respectively incite tumorous calli of Nicotiana tabacum which generate 

shoots, proliferate roots, are larger than normal, and do not synthesize,. 

15 r - . . 

octopine; all but ocs are one (oncogenicity) genes. In other hosts, 

mutants of these genes can induce different phenotypes (see M. W. Bevan ~ 

and M.-D. Chilton (1982) Ann. Rev, Genet. 16:357-384). Mutations in T-DNA 

genes do not seem to affect the insertion of T-DNA into the plant genome 

(J. Leemans et (1982) EMBO J. 1:147-152; L. W. Ream e!_aK (1983) 

20 Proc. Natl. Acad. Sci. USA 80:1660-1664). 

Octopine Ti plasmids carry an ocs gene which encodes octopine syn- ■ 
thase (lysopine dehydrogenase). All upstream signals necessary for 
expression of the ocs gene are found within 295 bp of the ocs transcrip- 
tional start site (C. Koncz et (1983) EMBO J. _2 :1597-1603) . P. Dhaese 

25 JlLil* (1983) EMBO J. 2_:419-426, reported the utilization of various poly- 
adenylation sites by "transcript 7" (0RF3 of Barker jjt j*U , supra ) and 
ocs. The presence of the enzyme octopine "synthase within a tissue can 
protect that tissue from the toxic effect of various amino acid analogs 
(G. A. Dahl and J. Tempt (1983) Theor. Appl. Genet. 66^:233-239; M. G. 

30 Koziel _et a]_. (1984) J. Mol. Appl. Genet. 2:549-562). 

Nopal ine Ti plasmids encode the nopal ine synthase gene ( nos) 
(sequenced by A. Depicker et _aK (1982) J. MoT. Appl. Genet . J_:561-573) . 
The "CAAT" box, but not upstream sequences therefrom, is required for 
3 c wild-type levels of nos expression; a partial or complete "TATA" box 
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15 



20 



30 



supports very low level nos activity (C. H. Shaw et_dl_. (1984) Nucl. Acids 
Res. J_Z:7831-7846). Genes equivalent to _tms_ and tmr have been Identified 

on a noDallne-tvoe olasmld and a numK 

- - - ... „ . i^wj ituf t uccn mopped 

(L. Wlllmltzer et ah (1983) Cell 32_:1045-1056). 
5 Transcription from hairy root T-DNA has also been detected 

(L. WHlmitzer et al_. (1982) Mol. Gen. Genet. 185:16-22). R1 plasmids and 
tms" Ti plasmids can complement each other when inoculated onto plants, 
resulting in calli capable of hormone-independent growth (G. M. S. 
van Slogteren (1983) Ph.D. thesis, Ri jksuni versiteit te Leiden, 
10 Netherlands). 

TIP plasmid genes outside of the T-DNA region include the vir genes 
which when mutated result in an avirulent Ti plasmid. Several vi7 genes 
have been accurately mapped and have been found to be located in regions 
conserved among various Ti plasmids' (V. N. Iyer et_^l. (1982) Mol. Gen. 
Genet. J_88:418-424). The _vijr genes function in trans , being capable of 
causing the transformation of plant cells with T-DNA of a different 
plasmid type and physically located on another plasmid (e.g. A. J. 
de Framond et.al_. (1983) Biotechnol. J_:262-269; A. Hoekema et _al_. (1983) 
Nature 303:179-180; J. Hille_et_al_. (1984) 0. Bacteriol. 158:754-756; 
A. Hoekema et^. (1984) 0. Bacteriol. 158:383-385); such arrangements are 
known as binary systems. Chilton et al_. (18 January 1983) 15th Miami 
Winter Symp. , described a "micro-Ti" plasmid made by resectioning the 
"mini-Ti" of de Framond _et _al_. , supra (see European Patent application 
125,546 for a description). G. A. Dahl et al_., U.S. Patent application 
25 ser. no. 532,280, and A. Hoekema (1985) Ph.D. Thesis, Ri jksuni versiteit te 
Leiden, The Netherlands, disclose micro-Ti plasmids carrying ocs genes 
constructed from pTil5955. M. Bevan (1984) Nucl. Acids Res. .12:8711-8721 , 
discloses a kanamycin-resistant micro-Ti. T-DNA need not be on a plasmid 
to transform a plant cell; chromosomal ly located T-DNA is functional 
(A. Hoekema et al_. (1984) EKBO J. _3_: 2485-2490). Ti pi asmid-determined 
characteristics have been reviewed by Merlo, supra (see especially 
•Table II therein), and Ream and Gordon, supra. 
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TIP Plasmid DNA 

Ri plasmlds have been shown to have extensive homology among them- 
selves (P. Costantino et^^el- (1981) Plasmld 5/.170-182) , and to both octo- 
pine (F. F. White and E. W, Nester (1980) J. Bacteriol. _144_:710-720) and 

5 nopaline (G. Ri sul eo et_ aU (1982) Plasmld 2_:45-51) T1 plasmlds, primarily 
1n regions encoding v1r genes, replication functions, and opine metabolism 
functions (L. Jouanln (1984) Plasmld _1£:91-102; K. Lahners _et _aK (1984) 
Plasmld n_:130-140; E. E. Hood ^t ^1_. (1984) Bi otechnol . X :7 °2-709 ; 
F. Leach (1983) Ph.D. Thesis, Universite de Paris-Sud, Centre d'Orsay, 

10 France); none of the homologies are in pRi T L -DNA. pRi T-DNA contains 
extensive though weak homologies to T-DNA from both types of Ti plasmid 
(U Willmitzer e^ al- (1982) Hoi • Gen. Genet. 186 :16-22). DNA from 
several plant species contains sequences, referred to as cT-DNA (cellular 
T-DNA), having homology with the Ri plasmld (F. F, White jrt _al_. (1983) 

15 Nature 301 :348-350, L. Spanoet^. (1982) Plant Molec. Biol . jj291-300; 

D. Tepfer (1982) in 2e Collogue sur les Recherches Fruitieres Bordeaux , : 
pp. 47-59). G. A. Huffman _et_al_. (1984) J. Bacteriol. 157 :269-276 and *" 
Jouanin, supra , and Leach, supra , have shown that, in the region of cross- 
hybridization, the Ri plasmid pRi A4 b is more closely related to a pT1A5 

20 (octopine-type) than pTiT37 (nopaline-type) and that this Ri plasmid 

appears to carry sequence homologous to tms but not tmr . Their results 
also suggested that Ri T-DNA may be discontinuous, analogous to the case 
with octopine T-DNA (see below). The restriction maps of pRiA4 b , pRil855, 
and pRiHRI were respectively disclosed by Huffman et al . , supra , 

25 M. Pompom* et^ _al_. (1983) Pi asmid JJD:119-129 , and L. Jouanin supra . Ri 
plasmids are often characterizable as being agropi ne-type or mannopine- 
type (A. Petit etjs]L (1983) Mol . Gen. Genet. 190 :204-214) . 

A portion of the Ti or Ri plasmid is found in the DNA of tumorous 
plant cells. T-DNA may be integrated (i.e. inserted) into host DNA at 

30* multiple sites in the nucleus. Flanking plant DNA may be either repeated 
or low copy number sequences. Integrated T-DNA can be found in either 
direct or inverted tandem arrays and can be separated by spacers. Much 
non-T-DNA Ti plasmid DNA appears to be transferred into the plant cell 
prior to T-DNA integration (H. Joos et _al_. (.1983) EMBO J. _2:2151-2160). 

35 T-DNA has direct repeats of about 25 base pairs associated with the 
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borders, I.e. with the T-DNA/plant DNA junctions, which may be Involved 1n 
either transfer from Agrobacterium or Integration Into the host genome. 

and right T-DNAs, respectively. T L (about 15-20 kbp) and T R (about 
5 8-10 kbp) are 'separated by about 15-20 kbp ( Huffman ^et_ _a]_. , supra » 

Jouanin, supra ) . The region of agroplne-type pRl and Tr Integrated can 
vary between individual plants or species inoculated (F. F. White et al . 

(1983) Nature 301:348-350; D. A. Tepfer (1984) Cell 37;959-967). Though 
T-DNA is occasionally deleted after integration in the plant genome, it is 

10 generally stable. Tumors containing a mixture of cells which differ in 
T-DNA organization or copy number are the result of multiple transforma- 
tion events. 

The exact location relative to the border repeats of T-DNA/f Tanking 
plant .DNA junctions varies and need not be within a border repeat, Viru- 

15 lence is not always eliminated after deletion of one of either of the 

usual nopaline T-DNA border sequences (compare H. Jobs et. (1983) Cell 
22:1057-1067 with K. Wang _et al_. (1984) Cell ^38:455-462 and C. H Shaw 
ellL (1984) Nucl. Acids Res. JL2_:6031-6041 , concerning the right 
border). The orientation of the right nopaline border can be reversed 

20 without total loss of functionality, and a single border sequence is 
capable of transforming closely-linked sequences (M. De Bl ock et al . 

(1984) EHBO 0. 2 : 1681-1689) . A synthetic 25 bp nopaline right border 
repeat is functional (Wang et al . , supra ) . Circular intermediates 
associated with T-DNA transfer appear to be spliced precisely within the 

25 25 bp direct repeats (Z. Koukol ikova-Nicol a et_ jTk (1985) Nature 
313:191-196). 

Manipulations of the TIP Plasmids 

Altered DNA sequences, including deletions, may be inserted into TIP 
30 plasmids (see Shuttle Vectors). Some pTi derivatives can be transferred 
to _E. coli and mutagenized therein (J. Hi 1 1 e j?t_ aJL (1983) J. Bacteriol. 
15i:693-701). P. Zambryski e^_aK (1983) EMBO J. _2 :2143-2150, report use 
of a vector, deleted for most T-DNA genes to transform tobacco and regen- 
erate morphologically normal plants. 
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The nopaline synthase promoter can drive expression of drug resis- 
tance structural genes useful for selection of transformed plant cells. 
M. W. Bevan et_al. (1983) Nature 104:184-187 ; R. T. Fraley et^aK (1983) 
proc. Natl. Acad. Sci. USA 80:4803-4807 ; and L. Herrera-Estrella et jiK 
(1983) EMBO 0. .2:987-995, have Inserted the bacterial kanamydn resistance 
structural gene (neomycin phosphotransferase II, NPT2), or kan . from Tn5_ 
downstream from (i.e. behind or under control of) the nopaline synthase 
promoter. The constructions were used to transform plant cells which in 
culture were resistant to kanamydn and its analogs such as neomycin and 
G418. Promoters for octopine T L genes 0RFZ4 and 0RF25 can also drive kan 
structural gene expression (J. Vel ten et (1984) EMBO J. J3:2723- 
2730). Herrera-Estrella et_ _aU , supra , reported a similar construction, 
in which a methotrexate resistance gene (dihydrofolate reductase, DHFR) 
from Tn7_ was placed behind the nos promoter; transformed plant cells were 
resistant to methotrexate. Furthermore, L. Herrera-Estrel 1 a _et_ _al_. (1983) 
Nature 303 :209-213, have obtained expression 1n plant cells of enzymatic : 
activity of octopine synthase and chloramphenicol acetyl transferase by 
placing their structural genes under control of nos promoters. G. Helrner": 
_et al_. (1984) Biotechnol. _2:520-527, have created a fusion gene useful as 
a screenable marker having the promoter and 5'-end of the nos structural '* 
gene fused to E_, coli B-gal actosidase (lacZ ) sequences. 

N. Murai et^a^. (1983) Science 222:476-482, reported fusion of the 
promoter and the 5'-end of the octopine synthase structural gene to a 
phaseolin structural gene. The encoded fusion protein was produced under 
control of the T-DNA promoter. Phaseol in-deri ved introns underwent proper 
post-transcri pti onal processi ng. 

SUMMARY OF THE INVENTION : 

One object of this invention is to provide means for promoting the 
expression of structural genes within plant cells wherein said genes are 
foreign to said cells. In pursuance of this goal, other objects are to 
provide pRi T-DNA promoters and transcript terminators, and especially 
pRi T L -DNA-deri ved promoters and pRi T L -DNA-deri ved polyadenylation sites, 
which are DNA sequences capable of controlling structural gene transcrip- 
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tion and translation within plant cells, and to provide developmental and 
phenotyplc regulation of said foreign structural genes. Another object 1s 
to provide specialized plant tissues and plants having within them pro- 
teins encoded by foreign structural genes and, in cases where the protein 
Is an enzyme, having or lacking metabolites or chemicals which respec- 
tively are not or are otherwise found in the cells in which the genes 1s 
inserted. Other objects and advantages will become evident from the 
following description. 

The Invention disclosed herein provides a plant comprising a geneti- 
cally modified plant cell having a foreign structural gene introduced and 
expressed therein under control of pRi T L -DNA-deri ved plant expressible 
transcription controlling sequences (TxCS). Further, the invention pro- 
vides plant tissue comprising a plant cell whose genome includes T-DNA 
comprising a foreign structural gene inserted in such orientation and 
spacing with respect to pRi T L -DNA-deri ved plant-expressible TxCS as to be 
expressible in the plant cell under control of those sequences. Also 
provided are novel strains of bacteria containing and replicating T-DNA, 
the T-DNA being modified to contain an inserted foreign structural gene in 
such orientation and spacing with respect to a T-DNA-deri ved, plant- 
expressible TxCS as to be expressible in a plant cell under control of 
said TxCS* Additionally, the invention provides novel vectors haying the 
ability to replicate in _E. coli and comprising T-DNA, and further com- 
prising a foreign structural gene inserted within T-DNA contained within 
the vector, in such manner as to be expressible in a plant cell under 
control of a pRi T L -DNA TxCS. Furthermore, strains of bacteria harboring 
said vectors are disclosed. 

Much is known about the location, size, and function of many tran- 
scripts activated when A^ tumefaciens T-DNA regions are transferred into 
the genome of plants (see Background). Most pTi T-DNA Tj_-DNA open reading 
frames (ORFs) correlate with known gene products. However, until the 
disclosure of the present invention, the art knew little about the number, 
•size, and function of genes activated when the T^-DNA regions from 
A* rhizogenes plasmids, such as pRiA4, are transferred into a plant 
genome. Agropine synthase, tms- 1 and tms- 2 genes have been identified by 
homology with pTi T-DNA in Ri plasmids, but these loci are located in 
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pRi T R -DNA (6. A. Huffman etal. (1984) J. Bacterid . 157:269-276; 
L. Jouanln (1984) Pi asmid 12:91-102) . The experimental work presented 
herein 1s believed to be the first disclosure of a pRi T L -DNA sequence or 
of any sequence homologous thereto. The availability of this sequence 
5 will enable and otherwise facilitate work 1n the art of plant transforma- 
tion to express foreign structual genes and to engage 1n other manipula- 
tions of pRi T L -DNA and pRi T L -DNA-deri ved sequences. Without the newly 
disclosed pRi Tj_-DNA sequence, those of ordinary skill 1n the art would be 
unable to use promoters and polyadenylation sites contained therein to 
10 promote transcription and translation in plant cells of foreign structural, 
genes. The disclosed sequence reveals the existence of previously unknown 
T-DNA ORFs and associated transcription controlling sequences, and makes 
possible construction of recombinant DNA molecules using promoters and 
polyadenylation sites from pRi T^-DNA genes whose sequences were hitherto 
15 unknown and unavailable to the public. The work presented herein is also 
believed to be the first disclosure of developmental and phenotypic regu- 
lation of T-DNA genes. Results newly disclosed herein will allow those of 
ordinary skill in the art to use T-DNA transcription controlling sequences 
which are so regulated to express heterologous foreign structural genes in 
20 transformed plants. T-DNA genes known to the art before the present dis- 
closure are not known to be so regulated. Furthermore, knowledge of 
pRi T|_-DNA sequence enables one to bring to utility promoters and poly- 
adenylation sites that are presently unrecognized; in the future, should a 
new pRi T^-DNA transcript be discovered and mapped, the sequence disclosed 
25 herein will permit associated TxCSs to be combined with heterologous 
foreign structural genes. 

The present invention comprises foreign structural genes under con- 
trol of pRi T L -DNA promoters expressible in plant cells, the promoter/gene 
combination being inserted into a plant cell by any means known to the 

39 • art. More specifically, /in its preferred embodiment the invention dis- 
closed herein comprises expression in plant cells of foreign structural 
genes under control of certain pRi TL-DNA-deri ved plant expressible TxCSs, 
after introduction via T-DNA, that is to say, by inserting the foreign 
structural gene into T-DNA under control of a pRi T L -DNA promoter and/or 

35 ahead of a pRi T^-DNA polyadenylation site and introducing the T-DNA con- 
taining the TxCS/structural gene combination into a plant cell using known 
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means. Once plant cells transformed to contain a foreign structural gene 
expressible under control of a pR1 T^-DNA TxCS are obtained, plant tissues 
and whole plants can be regenerated therefrom using methods and techniques 
well known 1n the art. The regenerated plants are then reproduced by 
5 conventional means and the Introduced genes can be transferred to other 
strains and cultlvars by conventional plant breeding techniques. The 
Invention 1n principle applies to any introduction of a foreign structural 
gene combined with a pRi T^-DNA promoter or polyadenyl ation site into any 
plant species into which foreign DNA (in the preferred embodiment pTi 
10 T-DNA) can be introduced and maintained by any means. In other words, the 
invention provides a means for expressing a structural gene in a plant 
cell and is not restricted to any particular means for introducing foreign 
DNA into a plant cell and maintaining the DNA therein. Such means 
include, but are not limited to, T-DNA-based vectors (including pTi-based 
vectors), viral vectors, mi nichromosomes, non-T-DNA integrating vectors, 
and the like. 



15 



25 



The invention is useful for genetically modifying plant cells, plant 
tissues, and whole plants by inserting useful structural genes from other 
species, organisms, or strains that change phenotypes o*f plants or plant 
20 cells when expressed therein. Such useful structural genes include, but 
are not limited to, genes conveying phenotypes such as improved tolerance 
to extremes of heat or cold; improved tolerance to drought or osmotic 
stress; improved resistance or tolerance to insect (e.g. insecticidal 
toxins), arachnid, nematode, or epiphyte pests and fungal, bacterial, or 
viral diseases, or the like; the production of enzymes or secondary meta- 
bolites not normally found in said tissues or plants; improved nutritional 
(e.g. storage proteins or lectins), flavor (e.g. sweet proteins), or pro- 
cessing properties when used for fiber or human or animal food; changed 
morphological traits or developmental patterns (e.g. leaf hairs which 
protect the plant from insects, aesthetically pleasing coloring or form, 
changed plant growth habits, dwarf plants, reduced time needed for the 
plants to reach maturity, expression of a gene in a tissue or at a time 
that gene is not usually expressed, and the like); male sterility; 
improved photosynthetic efficiency (including lowered photorespi rati on) ; 
35 improved nitrogen fixation; improved uptake of nutrients; improved 

tolerance to herbicides; increased crop yield; improved competition with 
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other plants; and Improved germplasm Identification by the presence of one 
or more characteristic nucleic add sequences, proteins, or gene products, 
or phenotypes however Identified (to distinguish a genetically modified 
plant of the present Invention from plants which are not so modified, to 
facilitate transfer of a linked artificially Introduced phenotype by other 
(e.g. sexual) means to other genotypes or to facilitate identification of 
plants protected by patents or by plant variety protection certificates); 
selectable markers (I.e. genes conveying resistance in cell or tissue 
culture to selective agents); screenable markers; and the like. 

The invention is exemplified by introduction and expression of a 
structural gene for phaseolin, the major seed storage protein of the bean 
Phaseolus vul garis L* , into plant cells. The i ntroduction arid expression 
of the structural gene for phaseolin, for example, can be used to enhance 
the protein content and nutritional value of forage or other crops. The 
invention is also exemplified by the introduction and expression of a 
.Lectin structural gene, in this case also obtained from jp. vulgaris , into 
plant cells. The introduction and expression of a novel lectin may be 
used to change the nutritional or symbiotic properties of a plant 
tissue. The invention is exemplified in yet other embodiments by the 
Introduction and expression of DNA sequences encoding thaumatin, and its 
precursors prothaumatin, prethaumati n , and preprothaumatin. Mature thau- 
matin is a heat-labile, sweet-tasting protein found naturally in katemfe 
( Thaumatococcus daniellii ) which can be used to enhance the flavor of 
vegetables which are eaten uncooked without significantly increasing the 
caloric content of the vegetables. The invention is further exemplified 
by introduction and expression of a structural gene for a crystal protein 
from ji. thuringiensis var. kurstaki HD-73 into plant cells. The introduc- 
tion and expression of the structural gene for an insecticidal protein can 
be used to protect a crop from infestation with insect larvae of species 
which include, but are not limited to, hornworm ( Handuca sp.) » pink boll- 
worm ( Pectionophora gossypiella ), European corn borer (Ostrinia 
nubilalis), tobacco budworm ( Heliothis virescens ) , and cabbage looper 
( Trichopl usi a ni). Applications of insecticidal protein prepared from 
sporulating B. thuringiensis does not control insects such as the pink 
bollwonn in the field because of their particular life cycles and feeding 
habits. A plant containing in its tissues insecticidal protein will con- 
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trol this recalcitrant type of Insect, thus providing advantage over prior 
insectlcidal uses of B_ . thurlngiensls . By incorporation of the Insectlci- 
dal protein Into the tissues of a plant, the present Invention addition- 
ally provides advantage over such prior uses by eliminating Instances of 
3 ..w..u.. . . V i >•• .vunvn omu tuc ui uuy i ng ana applying insectlcidal 

preparations to a field. Also, the present invention eliminates the need 
for careful timing of application of such preparations since small larvae 
are most sensitive to insecticidal protein and the protein is always 
present, minimizing crop damage that would otherwise result from preappli- 
10 cation larval foraging. Other uses of the invention, exploiting the pro- 
perties of other structural genes introduced into various plant species, 
will be readily apparent to those skilled in the art. 



15 DESCRIPTION OF THE DRAWINGS 

Figure 1. presents maps of the T^-DNA of agropine Ri plasmid pRiHRI 
and strategy used for sequencing. The top line represents the T L -DNA 
region from pRiHRI and the filled boxes indicate locations of ORFs 1 to 

20 18. The left and right T L -DNA borders are those identified from analysis 
of Tj_-DNA integrated into Convolvulus arvensis clone 7 tissue. ORF 
polarities are indicated by the position of enclosed boxes on the con- 
tinuous line; above indicates transcription from left to right and below 
indicates transcription right to left, i.e. having an mRNA sequence com- 

25 plementary to that disclosed in Fig. 2. EcoRI and BamHI restriction maps 
are below the ORF map. The complete nucleotide sequence of the T L -DNA was 
determined from five subclones mapped below the restriction maps: 
_Ecp_RI 3a, BamH I 8a ; Number 16, pLJO ("cosmid 40"); and EcoRI 3b (see 
Example 2.2). Comparison of restriction enzyme site patterns (L. Jouanin 

30 (1984) Plasmid J7.91-102) and overlapping nucleotide sequenced region 
(Number 16 and cosmid 40) indicate that pRiHRI and pRiA4 T L -DNAs are 
essentially identical. Cleavage sites and direction of sequence analysis 
are shown below each subclone, and horizontal arrows indicate direction 
and distance of sequencing runs. Enzymes are abbreviated as follows: 

35 A, Ava_I ; Ac, AccI; B, BamHI; Bg, BgHl; C, C1_al; D, Oral; E, EcoRI; 

H, Hindi II; K, Kpnl; Msl, Mstl; MsII, Mstl I; Na, Narl; Nc, Ncol; Ps, Pst I; 
Pv, PvuII; Sa, Sail; St, Stul; Xb, Xbal; Xh, Xhol; xm, Xmnl; and 
Xo, Xorll. 
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Figure 2. presents nucleotide sequence of T L -DNA region from 
£• rhlzogenes agropl ne-type plasmid pRIHRI. The sequence starts 520 base 
pairs (bp) to the left of the left T L -DNA/plant junction sequence Identi- 
fied 1n C* arvensis clone 7 and extends 1135 bp to the right of the 
clone 7 right T L -DNA/plant junction, a total of 21,126 bp. 

Figure 3. 1s a schematic diagram, not drawn to scale, of the DNA 
manipulation strategy utilized in the Examples. Sites susceptable to the 
action of a restriction enzyme are indicated by that enzyme's name or 
place of listing in a Table. For example, "T4c2" refers to an enzyme 
listed in Table 4, column 2. A site that is no longer susceptable to the 
enzyme is indicated by the presence of parenthesis around the name of the 
enzyme. The extent and polarity of an ORF is indicated by an arrow. 
Names of plasmids, again sometimes designated by place of listing in a * 
Table (e.g. M T5cl M refers to a vector listed in Table 5 f column 1) , are ^ 
within the circular representations of the plasmids. Names of vectors, > 
again sometimes designated by a listing in a Table, are within the 
circular representations of the plasmids. "Ex" refers to the Example 
which describes a particular manipulation. 

DETAILED DESCRIPTION OF THE INVENTION 

The following terms are defined in order to remove ambiguities to the 
intent or scope of their usage in the Specification and Claims. 

TxCS : ^Transcription .control ling ^sequences refers to a promoter/tran- 
script terminator combination flanking a particular structural gene or 
open reading frame (ORF). The promoter and transcript terminator DNA 
sequences flanking a particular inserted foreign structural gene need not 
be derived from the same source genes (e.g. pairing two different 
pRi T L -DNA) genes or the same taxonomic source (e.g. pairing sequences 
from pRi T|_-DNA with sequences from non-pRi-T L -DNA sources such as other 
types of T-DNA, plants, animals, fungi, yeasts, and eukaryotic viruses). 
Therefore the term TxCS refers to either combination of a claimed promoter 
with an unclaimed transcript terminator, or combination of a unclaimed 
promoter with a claimed polyadenyl at ion site, or combination of a promoter 
and a polyadenyl ation site which are both claimed. Examples of non- 
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pRi-T L -DNA plant-expressible promoters which can be used 1n conjunction 
with a pRi Tl-DNA polyadenylation site include, but are not limited to, 
those from genes f or jtos , ocs f phaseolln, RuBP-Case small subunit and the 
195 and 35S transcripts of cauliflower mosaic virus (CaMV). 

Promoter : Refers to sequences at the 5'-end of a structural gene 
involved in initiation of translation or transcription. Expression under 
control of a pR1 T-DNA promoter may take the form of direct expression 1n 
which the structural gene normally controlled by the promoter is removed 
in part or in whole and replaced by the Inserted foreign structural gene, 
a start codon being provided either as a remnant of the pRi T-DNA struc- 
tural gene or as part of the inserted structural gene, or by fusion pro- 
tein expression in which part or all of the structural gene is inserted 1n 
correct reading frame phase within the existing pRi T-DNA structural 
gene. In the latter case, the expression product is referred to as a 
fusion protein. The promoter segment may itself be a composite of seg- 
ments derived from a plurality of sources, naturally'occurring or syn- 
thetic. Eukaryotic promoters are commonly recognized by the presence of 
DNA sequences homologous to the canonical form 5 1 . . .TATAA. . .3' about 
10-30 bp 5 1 to the location of the 5'-end of the mRNA (cap site). About 
30 bp 5' to the TATAA another promoter sequence is often found which is 
recognized by the presence of DNA sequences homologous to the canonical 
form 5 1 . . . CCAAT. . .3 1 . Trans! ational initiation often begins at the first 
5'. ..AUG,. .3' 3'-from the cap site (see Example 1.5). 

Transcript terminator : Refers to any nucleic acid sequence capable 
of determining the 3' -end- of a eukaryotic messenger RNA (mRNA). The tran- 
script terminator DNA segment may itself be a composite of segments 
derived from a plurality of sources, naturally occurring or synthetic, and 
may be from a genomic DNA or an RNA-derived cDNA. Some eukaryotic RNAs, 
e.g. histone mRNA (P. A. Krieg and D. A. Melton (1984) Nature 308:203- 
206), ribosomal RNA, and transfer RNA, are not 3 1 -terminated by poly- 
adenylic acid or by polyadenylation sites; it is intended that the term 
transcript terminator include, but not be limited to, both nucleic acid 
sequences determining the 3'-ends of such transcripts and polyadenylation 
site sequences (see below). 
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Polyadenylation site : Refers to any nucleic acid sequence capable of 
determining the 3' -end of a eukaryotic polyadenyl ated mRNA. After tran- 
scriptional termination polyadenylic add "tans" are added to the 3'-end 
of most mRNA precursors* The polyadenyl atlon site DNA segment may Itself 
be a composite" of segments derived from a plurality of sources, naturally 
occurring or synthetic, and may be from a genomic DNA or an mRNA-derived 
cONA. Polyadenylation sites are commonly recognized by the presence of 
homology to the canonical form 5 1 . . ♦ AATAAA. . .3' , although variation of 
distance, partial "read-thru" f and multiple tandem canonical sequences are 
not uncommon. It should be recognized that a canonical "polyadenylation 
site" may in fact not actually cause polyadenylation per se (N. Proudfoot 
(1984) Nature 307 :412-413) and that sequences 3' to the "AATAAA" and the 
3'-end of the transcript may be needed (A. Gil and N. J. Proudfoot (1984) 
Nature 312:473-474). 

•Foreign- -structural gene : As used herein includes that portion. of ~a 
gene comprising a DNA segment coding for a foreign RNA, protein, polypep- 
tide or portion thereof, possibly including a trans! ati onal start codon, 
but lacking at least one other functional element of a TxCS that regulates 
initiation or termination of transcription and inititation of translation, 
commonly referred to as the promoter region and transcript terminator. As 
used herein, the term. foreign structural gene does not include pRi T L -DNA 
structural genes unless the structural gene and pRi T L -DNA transcription, 
controlling sequences combined with the structural gene are derived from 
different pRi T L -DNA genes; i.e. unless the structural gene and either a 
pRi promoter or a pRi polyadenylation site combined with the structural 
gene are heterologous. (Note that such foreign functional elements may be 
present after combination of the foreign structural gene with a pRi T L -DNA 
TxCS, though, in embodiments of the present invention, such elements may 
not be functional in plant cells). A foreign structural gene may encode a 
protein not normally found in the plant cell in which the gene is intro- 
duced. Additionally, the term refers to copies of a structural gene 
" naturally found within the cell but artificially introduced. A foreign 
structural gene may be derived in whole or in part from sources including 
but not limited to eukaryotic DNA, prokaryotic DNA, episomal DNA, plasmid 
DNA, plastid DNA, genomic DNA, cDNA, viral DNA, viral cDNA, or chemically 
synthesized DNA. It is further contemplated that a foreign structural 
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gene may contain one or more modifications 1n either the coding segments 
or untranslated regions which could affect the biological activity or 
chemical structure of the expression product, the rate of expression or 
the manner of expression control. Such modifications include, but are not 
5 limited to, mutations, Insertions, deletions, and substitutions of one or 
more nucleotides, and "silent" modifications that do not alter the 
chemical structure of the expression product but which affect Intercellu- 
lar localization, transport, excretion or stability of the expression 
product. The structural gene may constitute an uninterrupted coding 

10 sequence or it may Include one or more introns, bounded by the appropriate 
plant functional splice junctions, which may be obtained from synthetic or 
a naturally occurring source. The structural gene may be a composite of 
segments derived from a plurality of sources, naturally occurring or syn- 
thetic, coding for a composite protein, the composite protein being 

15 foreign to the cell into which the gene is introduced and expressed or 

being derived in part from a foreign protein. The foreign structural gene 
may be a fusion protein, and in particular, may be fused to all or part of 
a structural gene derived from the same ORF as was the TxCS. 

Plant tissue : Includes differentiated and undifferentiated tissues 
20 of plants including, but not limited to roots, shoots, pollen, seeds, 

tumor tissue, such as crown galls, and various forms of aggregations of 
plant cells in culture, such as embryos and calluses. The plant tissue 
may be in planta or in organ, tissue, or cell culture, 

25 Plant cell : As used herein includes plant cells in planta and plant 

cells and protoplasts in culture. 

Production of a genetically modified plant, plant seed, plant tissue, 
or plant cell expressing a foreign structual gene under control of a pRi 
T-DNA TxCS, and especially a pRi T L -DNA-deri ved TxCS, combines the 

30 specific teachings of the present disclosure with a variety of techniques 
and expedients known in the art. In most instances, alternative expe- 
dients exist for each stage of the overall process. The choice of expe- 
dients depends on variables such as the choice of the basic vector system 
for the introduction and stable maintenance of the pRi T L -DHA 

35 TxCS/structural gene combination, the plant species to be modified and the 
desired regeneration strategy, and the particular foreign structural gene 
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to be used, all of which present alternative process steps which those of 
ordinary skill are able to select and use to achieve a desired result. 
For instance, although the starting point for obtaining pR1 T L -DNA TxCSs 
1s exemplified in the present application by pRi T L -DNA isolated from 
5 pRiA4 and pRiHRI , DNA sequences of other homologous agropi ne-type R1 Ti 
plasmids might be substituted as long as appropriate modifications are 
made to the TxCS isolation and manipulation procedures. Additionally, 
T-DNA genes from other types of pRi T L -DNA homologous to the agropi ne-type 
pRi T L -DNA genes having TxCSs disclosed herein may be substituted, again 
10 with appropriate modifications of procedural details* Homologous genes 

may be identified by those of ordinary skill in the art by the ability of 
their nucleic acids to cross-hybridize under conditions of stringency 
appropriate to detect 70% homology; such conditions are well understood in 
the art. It will be understood that there may be minor sequence varia- 
15 tions within gene sequences utilized or disclosed in the present applica-. 

ti-on-. These variations may be determined by standard techniques to enable 
those of ordinary skill in the art to manipulate and bring into utility , 
the T-DNA promoters and transcript terminators of such homologous genes.- 
(Homologs of foreign structural genes may be identified, isolated, 
/U sequenced, and manipulated as is in a similar manner as homologs of the 

pRi genes of the present invention.) As novel means are developed for the 
stable insertion of foreign genes in plant cells, those of ordinary skill 
in the art will be able to select among those alternate process steps to 
achieve a desired result- The fundamental aspects of the invention are 
25 the nature and structure of pRi T-DNA genes $nd their use as a means for 

expression of a foreign structural gene in a plant genome. The remaining 
steps of the preferred embodiment for obtaining a genetically modified 
plant include inserting the pRi T L -DNA TxCS/structural gene combination 
into T-DNA, transferring the modified T-DNA to a plant cell wherein the 
30 • modified T-DNA becomes stably integrated as part of the plant cell genome, 
techniques for in vitro culture and eventual regeneration into whole 
plants, which may include steps for selecting and detecting transformed 
plant cells and steps of transferring the introduced gene from the 
originally transformed strain into commercially acceptable cultivars. 

35 An advantage, which will be readily understood by those skilled in 

the art, of use of transcription controlling sequences disclosed herein 
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for controlling structural gene expression over previously published T-DNA 
TxCSs is that transcription of many pRi T-DNA ORFs 1s phenotypkally and 
developmental ly regulated (see Example 1.9). pTi T-DNA genes are not 
known to be so regulated. Transcripts of ORFs 8, li, 13, and 15 are more 
prevalent 1n roots than leaves, with the case of 0RF.15 being particularly 
striking, while ORF 12 expression Is specific to leaves and to a partic- 
ular phenotype (V , see Example 1.9). Therefore, choice of a particular 
pRi T L -DNA TxCS allows modulation of expression of a structural gene with 
which the TxCS is combined. For example, should one want expression of a 
structural gene to be much higher in roots than leaves; 0RF15 provdies the 
TxCS of choice. 



A principal feature of the present invention 1h its preferred embodi- 
ment 1s the construction of T-DNA having an Inserted foreign structural 
gene under control of a pRi T^-DNA TxCS, I.e., between a promoter and a 
15 polyadenylation site, as these terms have been defined, supra , at least 
one of which is derived from pRi T^-DNA. The structural gene must be 
inserted in correct position and orientation with respect to the desired 
pRi T L -DNA promoter. Position has two aspects. The first relates to 
which side of the promoter the structural gene is inserted. It is known 
20 that the majority of promoters control initiation of transcription and 

translation in one direction only along the DNA. The region of DNA lying 
under promoter control is said to lie "downstream" or alternatively 
"behind" or "3' to" the promoter. Therefore, to be controlled by the 
promoter, the correct position of foreign structural gene insertion must 
by "downstream" from the promoter. The second aspect of position refers 
to the distance, in base pairs, between known functional elements of the 
promoter, for example the transcription initiation site, and the trans- 
lational start site of the structural gene. Substantial variation appears 
to exist with regard to this distance, from promoter to promoter. There- 
fore, the structural requirements in this regard are best described in 
functional terms. As a first approximation, reasonable operability can be 
obtained when the distance between the promoter and the inserted foreign 
structural gene is similar to the distance between the promoter and the 
T-DNA gene it normally controls. Orientation refers to the directionality 
35 of the structural gene. That portion of a structural gene which ulti- 
mately codes for the amino terminus of the foreign protein is termed the 
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5 l -end of the structural gene, while that end which codes for amino adds 
near the carboxyl end of the protein 1s termed the 3'-end of the struc- 
tural gene. Correct orientation of the foreign structural gene 1s with 
the 5' -end thereof proximal to the promoter. An additional requirement 1n 
the case of constructions leading to fusion protein expression 1s that the 
insertion of the foreign structural gene Into the pRi T^-DNA promoter- 
donated structural gene sequence must be such that the coding sequences of 
the two genes are in the same reading frame phase, a structural require- 
ment which is well understood in the art. An exception to this require- 
ment exists in the case where an Intron separates coding sequences derived 
from a foreign structural gene from the coding sequences of the pRi T L -DNA 
structural gene. In that case, both structural genes must be provided 
with compatible splice sites, and the intron splice sites must be so 
positioned that the correct reading .frame for the pRi T L -DNA .promoter- 
donated structural gene and the foreign structural gene are restored in 
phase after the intron is removed by post-transcriptional processing. - - ;r 
Differences in rates of expression or developmental control may be 
observed when a given foreign structural gene is inserted under control of 
different pRi T L -DNA TxCSs. Rates of expression may also be greatly 
influenced by the details of the resultant mRNA's secondary structure, 
especially stem-loop structures. Stability, ability to be excreted, 
intercellular localization, intracellular localization, solubility, target 
specificity, and other functional properties of the expressed protein 
itself may be observed in the case of fusion proteins depending upon the 
insertion site, the length and properties of the segment of pRi T L -DNA 
protein included within the fusion protein and mutual interactions between 
the components of the fusion protein that effect folded configuration 
thereof, all of which present numerous opportunities to manipulate and 
control the functional properties of the foreign protein product, depen- 
ding upon the desired physiological properties within the plant cell, 
plant tissue, and whole plant. Similarly to the promoter, the polyadenyl- 
ation site must be located in correct position and orientation relative to 
the 3'-end of the coding sequence. Fusion proteins are also possible 
between the 3' -end of the foreign structural gene protein and a polypep- 
tide encoded by the DNA which serves as a source of the polyadenyl at ion 
site. 
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A TxCS 1s comprised by two major functionalities: a promoter, which 
Is absolutely necessary for gene expression, and a transcript terminator, 
being 1n the preferred embodiment a polyadenylatlon site, positioned 
. w- K v.ww. ,w ,j ^ u„u j wu wi.c juuuurai gene. Although as exemplified 
herein these two portions of the TxCS are obtained from the same gene, 
this 1s not a requirement of the present invention. These 5' and 3' 
sequences may be obtained from diverse pRi T-DNA genes, especially 
pRi T L -DNA genes, or one of these sequences may even be obtained from a 
non-pRi T-DNA gene. For instance, a promoter may be taken from a 
pRi T L -DNA gene while the polyadenylation site may come from a plant gene. 

In the Examples, a foreign structural gene is nested within a 
pRi T L -DNA TxCS, suturing the structural gene Into the TxCS at Ndel sites 
and placing the entire TxCS/structural gene combination between a pair of 
lamHI sites. As will be apparent to those of ordinary skill in the art, 
the TxCS/gene combination may be placed between any restriction sites 
convenient for removing the combination from the plasmid it is carried on 
and convenient for insertion into the plant transformation or shuttle 
vector of choice. Alternatives to the use of paired Ndel sites 
(5 , ...CATATG...3') at the AT6 transl ational start include, but are not 
limited to, use of CUl (5'. ..(not G)ATC GAT(G) . . .3' ) or Ncol 
(5\..CCATGG...3') sites. As will be understood by persons skilled in the 
art, other sites may be used for the promoter/structural gene suture as 
long as the sequence at the junction remains compatible with translational 
25 and transcriptional functions. An alternative to the suture of the pro- 
moter to the foreign structural gene at the ATG translational start is 
suturing at the transcriptional start or cap site. An advantage, 
especially for eukaryotic structural genes, of the use of this location is 
the secondary (stem-loop) structure of the foreign structural gene mRNA 
30< will not be disrupted thereby leading to an mRNA having translational 
activity more nearly resembling the activity observed in the organism 
which was the source of the gene. The restriction sites at the 5'- and 
•3' -ends of the structural gene need not be compatible. Use of cut sites 
cut by two different restriction enzymes at the two TxCS/structural gene 
35 junctions will automatically correctly orient the structural gene when it 
is inserted between the TxCS elements, though use of an extra restriction 
enzyme may necessitate removal of an additional set of inconvenient 
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restriction sites within the TxCS and the structural gene. The use of a 
single restriction enzyme to link both a promoter and a polyadenyl ati on 
site to a particular structural gene is not required. Convenient sites 
within the pRi T L -DNA structural gene and 3 1 to the trans! ational stop of 
the foreign structural gene may be used. When these sites have incom- 
patible ends, they may be converted to blunt-ends by methods well known in 
the art and blunt-end ligated together. 

Location of the TxCS/foreign structural gene combination Insertion 
site within T-DNA or a T-DNA-deri ved vector is not critical as long as the 
transfer function of the T-DNA borders and any other necessary vector 
elements (e.g. a selectable or screenable marker) are not disrupted. The 
T-DNA into which the TxCS/structural gene combination is inserted may be 
obtained from any of the TIP plasmids, including both Ti and Ri plas- 
mids. The TxCS/structural gene combination is inserted by standard tech- 
niques well known to those skilled in the art. The orientation of the 
inserted plant gene, .with respect to the^directipn of transcription and_ 
translation of endogenous T-DNA or vector genes is not critical, either, of 
the two possible orientations is functional. Differences in rates of 
expression might be observed when a given gene is inserted at different 
locations within T-DNA. 

A convenient means for inserting a TxCS/foreign structural gene com- 
bination into T-DNA involves the use of a shuttle vector, as described in 
the Background. An Agrobacteri urn strain transformed by a shuttle vector 
is preferably grown under conditions which permit selection of a double- 
homologous recombination event which results in replacement of a pre- 
existing segment of a Ti or Ri plasmid with a segment of T-DNA of the 
shuttle vector. . However, it should be noted that the present invention is 
not limited to the introduction of the TxCS/structural gene combination 
into T-DNA by a double homologous recombination mechanism; a homologous 
recombination event with a shuttle vector (perhaps have only a single 
continuous region of homology with the T-DNA) at a single site will also 
prove an effective means for inserting that combination into T-DNA as will 
insertion of a combination-carrying bacterial transposon. 

An alternative to the shuttle vector strategy involves the use of 
plasmids comprising T-DNA or modified T-DNA, into which an TxCS/foreign 
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structural gene 1s Inserted, said plasmlds lacking v1_r genes and being 
capable of independent replication 1n an Agrobacterlum strain. As 
reviewed 1n the Background, the T-DNA of such plasmlds can be transferred 
from en Agrobacterlum strain (e.g. rhlzogenes . A., tumefadens . or derl- 
5 vatives thereof) to a plant cell provided the Agrobacterlum strain con- 
tains certain trans- acting v1r genes whose function 1s to promote the 
transfer of T-DNA to a plant cell. Plasmlds that contain T-DNA and are 
able to replicate Independently In an Agrobacterium strain are herein 
termed "sub-TIP" 'plasmlds. A spectrum of variations is possible in which 
10 the sub-TIP plasmids, which may be derived from Ri or T1 plasmids, differ 
in the amount of T-DNA contained. A "mini-TIP" plasmid retains all of the 
T-DNA from a TIP. "Hicro-TIP" plasmids are deleted for all T-DNA but that 
surrounding the T-DNA borders, the remaining portions being the minimum 
necessary for the sub-TIP plasmid to be transf errable and Integratable 1n 
15 the host cell. Sub-TIP plasmids are advantageous in that they are rela- 
tively small and relatively easy to manipulate directly, eliminating the 
need to transfer the gene to T-DNA from a shuttle vector by homologous 
recombination. After the desired structural gene has been Inserted, they 
can easily be introduced directly into a Agrobacterium cell containing the 
trans- acting genes that promote T-DNA transfer. Introduction into an 
Agrobacterium strain is conveniently accomplished either by transformation 
of the Agrobacterium strain or by conjugal transfer from a donor bacterial 
cell, the techniques for which are well known to those of ordinary 
skill. 
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pRi T-DNA TxCS/structural gene combinations may be combined with 
pTi-derived Ti plasmids or sub-TIP vectors. 

Modified T-DNA carrying a pRi T L -DNA TxCS/structural gene combination 
can be transferred to plant cells by any technique known in the art (see ■ 
30 % .Background). The resultant transformed cells must be selected or screened 
to distinguish them from untransformed cells. Selection is most readily 
accomplished by providing a selectable marker known to the art incorpora- 
ted into the T-DNA in addition to the TxCS/foreign structural gene com- 
bination. Indeed, a pRi T L -DNA TxCS can be a component of such a 
marker. In addition, the T-DNA provides endogenous markers such as the 
gene or genes controlling hormone- i ndependent growth of Ti -induced tumors 
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in culture, the gene or genes controlling abnormal morphology of R1- 
Induced tumor roots, and genes that control resistance to toxic compounds 
such as amino add analogs, such resistance being provided by an opine 
synthase (e.g. ocs) . Screening methods well known to those skilled in the 
art include assays for opine production, specific hybridization to charac- 
teristic RNA or T-DNA sequences, or immunological assays. Additionally 
the phenotype of expressed foreign gene can be used to identify trans- 
formed plant tissue (e.g. insecticidal properties of the crystal protein), 

Although the preferred embodiment of this invention uses a T-DNA- 
based Agrobacterium -medi a ted system for incorporation of the TxCS/foreign 
structural gene combination into the genome of the plant which is to be 
transformed, other means for transferring and incorporating the gene are 
also included within the scope of this invention. Other means for the 
stable incorporation of the combination into a plant genome additionally 
include.,, but are not limited to, use of vectors based upon viral genomes 1 
(e.g. see N. Brisson et al . (1984) Nature 310 :511-514) , mi nichromosomes 
transposons, and homologous or nonhomologous recombination into plant -* 
chromosomes. Alternate forms of delivery of these vectors into a plant 
cell additionally include, but are not limited to, direct uptake of 
nucleic acid (e.g. see J. Paszkowski et al_. (1984) EHBO J. 2 :2717-2722) 
fusion with vector-containing liposomes or bacterial spheropl asts , micro- 
injection, and encapsidation in viral coat protein followed by an infec- 
tion-like process. After introduction into a plant cell of a pRi T L -DNA 
TxCS/structural gene combination, the combination will be contained by a 
plant cell. Furthermore, the combination will be flanked by plant DNA, 
unless utilizing a noni ntegrati ng vector, e.g. a virus or mini chromosome. 

Regeneration of transformed cells and tissues is accomplished by 
resort to known techniques. An object of the regeneration step is to 
obtain a whole plant that grows and reproduces normally but which retains 
integrated T-DNA. The techniques of regeneration vary somewhat according 
to principles known in the art, depending upon the origin of the T-DNA, 
the nature of any modifications thereto and the species of the transformed 
plant. In many plant species, cells transformed by pRi-type T-DNA are 
readily regenerated, using techniques well known to those of ordinary 
skill, without undue experimentation. Plant cells transformed by pTi-type 
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T-DNA can be regenerated, In some Instances, by the proper manipulation of 
hormone levels 1n culture. Preferably, however, the T1-transformed tissue 
1s most easily regenerated 1f the T-DNA has been mutated 1n one or both of 
the tmr snd tms genes. It Is important to note that If the mutations In 
5 tmr and tms are Introduced Into T-DNA by double homologous recombination 
with a shuttle vector, the incorporation of the mutation must be selected 
1n a different manner than the Incorporation of the TxCS/structural gene 
combination; e.g. one might select for tmr and tms inactlvatlon by chlo- 
ramphenicol resistance while one might select for TxCS/fore1gn gene 1nte- 

10 gration by kanamycin resistance. The 1nact1vation of the tms and tmr lod 
may be accomplished by an insertion, deletion, or substitution of one or 
more nucleotides within the coding regions or promoters of these genes, 
the mutation being designed to inactivate the promoter or disrupt the 
structure of the encoded proteins (e.g. the T-DNA of NRRL B-15821, or the 

15 pTi of A3004, L. W. Ream^et^L (1983) Proc. Natl. Acad. Sci. U.S.A. 

80/.1660-1664) . Resultant transformed cells are able to regenerate plants 
which carry Integrated T-DNA and express T-DNA genes, such as an opine 
synthase, and also express an inserted pRi T^-DNA TxCS/structural gene 
combination. These serve as parental plant material for normal progeny 

2G 

plants carrying and expressing the pRi T|_-DNA TxCS7heterologous foreign 
structural gene combination, and for seeds containing the combination, In 
the preferred embodiments the combination being integrated into a plant 
chromosome and flanked by plant DNA. 

25 The genotype of the plant tissue transformed is often chosen for the 

ease with which its cells can be grown and regenerated in in vitro culture 
and for susceptibility to the selective agent to be used. Should a cul- 
tivar of agronomic interest be unsuitable for these manipulations, a more 
amenable variety is first transformed. After regeneration, the newly 

30 introduced TxCS/foreign structural gene combination is readily transferred 
to the desired agronomic cultivar by techniques well known to those 
skilled in the arts of plant breeding and plant genetics. Sexual crosses 
.of transformed plants with the agronomic cultivars yielded initial 
hybrid. These hybrids can then be back-crossed with plants of the desired 

3S genetic background. Progeny are continuously screened and selected for 

the continued presence of integrated T-DNA or for the new phenotype resul- 
ting from expression of the inserted foreign gene. In this manner, after 
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a number of rounds of back-crossing and selection, plants can be produced 
haying a genotype essentially Identical to the agronomlcally desired 
parents with the addition of a Inserted pR1 T-DNA promoter/foreign 
structural gene combination or of a foreign structural gene/polyadenyla- 
tlon site combination. 



EXAMPLES 

The following Examples are presented for the purpose of illustrating 
specific embodiments within the scope of the present "Invention without 
limiting the scope; the scope being defined by the Claims. Numerous 
variations will be readily apparent to those of ordinary skill in the art. 

These Examples utilize many' techniques well known and accessible to 
those skilled in the arts of molecular biology and manipulation of TIPs 
and Agrobacterium ; such methods are fully described in one or more of the 
cited references if not described in detail herein. Enzymes are obtained 1 
from commercial sources and are used according to the vendor's recommenda- 
tions or other variations known to the art. Reagents, buffers and culture 
conditions are also known to those in the art. Reference works containing 
such standard techniques include the following: R. Wu, ed. (1979) Meth. - 
Enzymol. J58, R . * Wu jt , eds. (1983) Meth. Enzymol . 100 and 101 , 
U Grossman and K. Moldave, eds. (1980) Meth. Enzymol. 6S_ 9 J. H. Miller 
(1972) Experiments in Molecular Genetics , R. Davis _et jsl_. (1980) Advanced 
Bacterial Genetics , R. F. Schleif and P. C. Wensink (1982) Practical 
Methods in Molecular Biology , and T. Maniatis £t jl_. (1982) Molecul ar 
Cloning . Additionally, R. F. Lathe jit__al_. (1983) Genet. Engin. 4_:l-56, 
make useful comments on DNA manipulations. 

Textual use of the name of a restriction endonuclease in isolation, 
e.g. "Bel I " , refers to use of that enzyme in an enzymatic digestion, 
except in a diagram where it can refer to the site of a sequence suscep- 
tible to action of that enzyme, e.g. a restriction site. In the text, 
restriction sites are indicated by the additional use of the word "site", 
e.g. "Bel I site". The additional use of the word "fragment", e.g. "Bel I 
fragment", indicates a linear double-stranded DNA molecule having ends 
generated by action of the named enzyme (e.g. a restriction fragment). A 
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phrase such as "BcH/SmM fragment" Indicates that the restriction frag- 
ment was generated by the action of two different enzymes, here Bc_U and 
Smal, the two ends resulting from the action of different enzymes. Note 
that the ends will have the characteristics of being "sticky" (I.e. having 
a single-stranded protrusion capable of base-pa1r1ng with a complement ary^ 
single-stranded oligonucleotide) or "blunt" and that the sequence of a 
sticky-end will be determined by the specificity of the enzyme which pro- 
duces it. 

In the Examples and Tables, the underlining of a particular nucleo- 
tide 1n a primer or other sequence indicates the nucleotide which differs 
from the naturally found sequence, being an insertion or substitution of 
one or more nucleotides. The use of lower case for two adjacent nucleo- 
tides brackets one or more nucleotides that have been deleted from the 
15 native sequence. Unless otherwise noted, all oligonucleotide primers are 
phosphorylated at their S'-ends, are represented 5'-to-3\ and are synthe- 
sized and used as referenced in Example 5. 

Plasmids are usually prefaced with a "p\ e.g., pRi A4 or P 8.8, and 
strain parenthetically indicate a plasmid harbored within, e.g., 
A. rhizogenes ( P RiA4) or_E. coM HB101 (p8-8). Self-replicating DNA mole- 
cules derived from the bacteriophage Ml 3 are prefaced by an "m", e.g. 
mWB234l, and may be in either single-stranded or double-strand form. 
A. tumefaciens (pTil5955) is on deposit in ATCC 15955. E_. co]± C600 
(pRK-203-Kan-103-lec) as NRRL B-15821.J.. coVi_ HB101 (pU40) as NRRL 
B-15957, and E. colj. HB101 (EcoRI e36) as NRRL B-15958 (as deposited EcoRI 
e36 was designated EcoRI 3a); other deposited strains are listed in column 
3 of Table 7. 

The DMA constructions described in these Examples have been designed 
3 0 to enable any one of the eukaryotic TxCSs of pRi T L -DNA to be combined 
with any of four foreign structural genes. Towards that end, the struc- 
tural genes, the TxCSs, and the TxCS/structural gene combinations have 
.been placed on DNA "cassettes", having the properties that, after initial 
modifications have been made, any structural gene may be readily inserted 
35 into any TxCS without further modification, and any TxCS/structural gene 
combination may be isolated by a simple procedure applicable to all such 
combinations. All combinations are thereby equivalent when being inserted 
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Into the plant transformation vector of choice. The Initial modifications 
of the TxCSs are all analogous to each other and the initial modifications 
of the structural genes are also all analogous to each other. These 
Examples often involve the use of a common strategy for multiple construc- 
tions that differ only in Items such as choice of restriction enzymes, DNA 
fragment size, ORFs encoded, plasmlds generated or used as starting 
material, specific numbers and sequences of oligonucleotides used for 
mutagenesis, sources of plasmids, and enzyme reactions utilized. For the 
sake of brevity, the DNA manipulations and constructions are generally 
described once, the differing items being detailed by reference to a par- 
ticular column in a particular Table, a particular series of manipulations 
used in a particular construction occupying horizontal lines within that 
Table. One combination, the ORF 11 TxCS with the crystal protein struc- 
tural gene, is also detailed in the text. 

The following is an outline, diagrammed schematically in Figure 3, of 
"a preferred strategy used to make the exemplified DNA constructions 
detailed in Examples 3 through 6. Endogenous Nde l sites are removed from* 
the H13-based vector mWB2341, resulting in a vector designated 
mWB2341( Nde ) (Example 3.1). Large fragments of T-DNA are introduced into- 
mWB2341( Nde) in a manner that also eliminates the vector's BamH I site 
(Example 3.2). Endogenous T-DNA Nde l and BamH I sites are then removed 
(Example 3.3) and novel sites are introduced. Ndel sites are introduced 
at and near the transl at ional start and stop sites, respectively, so that 
a foreign structural gene on a Ndel fragment may replace** 'the endogenous 
ORF structural gene. BamH I sites are introduced approximately 0.3 kbp 5' 
to and 3 1 from the transcriptional start and stop signals, respectively, 
so that the TxCS/structural gene combination eventually constructed may be 
removed on a BamH I fragment (Example 3-4). The structural genes, which 
fortuitously have no internal Nde l or BamH I sites, are introduced into 
mWB2341 ( Nde) (Example 4.1) and Nde l sites are introduced at and after the 
translational start and stop sites (Examples 4.2 and 4.3). The structural 
genes are removed from their vectors on "DNA cassettes 1 ' by digestion with 
Nde l and are inserted into any desired TxCS which has had its endogenous 
structural gene removed by Nde l digestion (Example 6.1). The TxCS/foreign 
structural gene combinations are then removed from their vector by diges- 
tion with BamH I and inserted into the plant transformation vectors of 



1C 



15 



20 



25 



0204590 



- 31 - 



choice (Example 6.2). It is recognized that construction strategies 
utilizing fortuitously located restriction sites might be designed by 
persons of ordinary skill which might be simpler for some particular 
TxCS/structural gene combination than the generalized DNA cassette 
strategy utilized herein; however, DNA cassettes are a better approach 
when one is trying to achieve flexibility 1n the choice and matching of 
many diverse TxCSs and structural genes. 

Example 1 

This Example provides disclosure, analysis, and discussion of the 
pRi T[_-DNA sequencing results. 

1.1 Summary of results 

pRi T L -DNA was sequenced and eighteen open reading frames (ORFs) , 
two of which (7 and 18) were clearly prokaryotic in nature, were found! 
Eleven ORFs had canonical eukaryotic promoter and polyadenyl ation elements 
(ORFs 1, 2, 3, 6, 8, 11, 12, 13, 14, 15 and 16). These ORFs were distri- 
buted within an about 19.4 kilobase pair (kbp) segment of pRi T L -DNA inte- 
grated into the genome of C. arvensis clone 7. DNA encoding ORFs 8, 11, 
12, 13, and 15 was observed to be transcribed in tobacco. 

1.2 Sequence of pRi T L -DNA 

A physical map of the pRi Tj_-DNA region is shown in Figure 1 along 
with pRi subclones and the nucleotide sequencing strategy used. Nine- 
tenths of the sequence obtained was determined from both DNA strands, the 
remaining tenth being sequenced more than once from the same DNA strand. 
A nucleotide sequence of 21,126 base pairs (bp) was obtained, which inclu- 
ded a 19.4 kbp pRi T L -DNA region identified in. the genome of _C. arvensis 
clone 7, and is presented in Figure 2, 5'-to-3' corresponding to 
left-to-right as mapped in Fig. 1. DNA was sequenced from the 5'-end of 
BamHI fragment 32 to about 2216 bp into EcoRI fragment 3b (3'-end) (see 
Fig. 1). The cleavage sites for over seventy restriction enzymes were 
determined; cleavage positions for enzymes with less than nineteen sites 
are 1 isted in Table 1. 
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1.3 T L -DNA border repeats 

Genomic hybridization end DNA sequence analyses of the T L -DNA region 
Integrated into the genome of C. arvensls clone 7 showed the exact loca- 
tion of a left plant/T-DNA junction and an approximate position for a 
5 right pRi T L -DNA/plant junction (F. Leach (1983) Ph.D. Thesis, Universite 
de Paris-Sud, Centre d'Orsay, France). The left plant DNA/T-DNA junction 
was between position 570 and 571, as defined In Fig. 2. The left 25 bp 
T-DNA border repeat sequence was located between positions 520 and 544. 
The right boundary of T^-DNA of Ri A4-transf ormed C. arvensls could vary 
10 over a 8 kbp region. The complete 21,126 bp of pRi T L -DNA region was 

scanned for the presence of a 25 bp consensus sequence derived by compari- 
son with published sequences, 5 ' TGGCAGGATATAT^^GCTAA^j3 1 . Twenty-seven 
nucleotide sequences matching this consensus at 15 or more bases were 
identified. Included among these "sequences were the 25 bp nucleotide 
15 sequences starting (5') at positions 520 (matching at 23 of 25 bases)^and 
19,966 (17 of 25) (see Fig. 2). These two positions were near the 
T-DNA/plant junctions of a transformed Nicotiana glauca tissue (F. F. 
White et _al_. (1983) Nature 301 :348-350) and _C. arvensis clone 7, as deter- 
mined by comparison of genomic restriction maps of transformed plant DNA 
20. and pRi A4 DNA. Other matches were found at positions 154, 576, 725, 3244, 
6316, 6365. 7209, 7379, 8697, 10339, 10436, 11079, 11232, 12313, 13832, 
14235, 14510, 15145, 16285, 17071, 17483, 18121, 18273, 18368, and 
18797. The eleven previously published 25 bp border repeat sequences were 
as little as 64% homologous to each other, thus indicating that many of 
25 these pRi border sequences could be functional. Genomic hybridization 
analysis of the pRi T L -DNA region in tobacco (D. Tepfer (1984) Cell 
37 :959-967) showed a much smaller Tj_-DNA with the left junction probably 
involving a border sequence at either position 6316. or 6365. 

30 i .4 Identification of open reading frames 

Analysis of the nucleotide sequence presented in Fig. 2 revealed the 
presence of sixteen ORFs starting with an ATG initiation codon and exten- 
ding over 300 nucleotides. The locations, sizes, and molecular weights of 
the putative t ransl ati onal polypeptides of these ORFs are listed in 

35 Table 3. Two additional ORFs (9 and 10) were shorter than 300 nucleotides 
but were included in Table 3 because they satisfied other criteria (see 
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below). The size of the ORFs ranged from 255 nucleotides (ORF 9) up to 
2280 nucleotides (ORF 8), encoding polypeptides ranging 1n size from 9600 
to 85,000 daltons, respectively. However, the actual size of an RNA tran- 
script encoding an ORF could be considerably larger than that listed In 
5 Table 3 because 5' and 3' noncoding regions and S'-polyadenyl ic add tails 

were not Included. 

Though to date no Introns have been found 1n any of the fourteen 
sequenced pT1 T-DNA genes, (R. F. Barker et aj_. (1983) Plant Mol. Biol; 
_2:335-350), J. Gielen et al_. (1984) EMBO J. 2:835-846), Introns are 
0 present 1n some plant nuclear genes; pRi T L -DNA genes could have 

introns. Transcript mapping (Example 1.9) did not generally Indicate 
spliced mRNA. However, analysis of mRNA encoded between positions 6500 
and 9000 detected two transcripts, a 2300 base transcript as predicted for 
ORF 8 and an unpredicted 650 base transcript. The nucleotide sequence of 
L5 the only other ORF in this region, ORF 9, suggested a transcript of about 
450 bases, about half the size as found. The coding region of ORF 8 was 
scanned for sequences which matched consensus donor 
(5'exon...TG*GT^AGT...1ntron3' , the "*" indicating the splice site) and 
acceptor (intron. . .^JJ]STAG*6^ . -exon) intron splice sequences and con- 
20 formed to the G-T/A-G rule (R. Breathnach et al_. (1978) Proc. Natl. Acad. 
Sci. USA 75:4853-4857) and a plant consensus sequence (J. L. Slightom 
• et al_. (1983) Proc. Natl. Acad. Sci. USA 80:1897-1901 ) . Splicing between 
an acceptor at position 8943 and a donor at positions 7283, 7327, 7374, 
7701, or 7894 would result in a second transcript having a translation 
25 initiation codon-polyadenylation site distance of 724, 758, 943, 1270, or 
1325 bp, respectively, which is in the size range observed. Proper pro- 
cessing of an intron-containing genes in T-DNA has been observed (e.g. 
N. Murai et aU U^ 83 ) Science 222:476-482). 

No homology greater than random was found to exist in coding or 
30 . noncoding sequences between pRi T L -DNA and octopine pTi T-DNA (Barker 
elil-. supra ), consistent with the lack of cross-hybridization between 
pRi T L -DNA and octopine pTi T-DNA observed by G. A. Huffman et aj_. (1984) 
J. Bacteriol. 157 :269-276, and L. Oouanin (1984) Pi asmid ,12:91-102. 
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1,5 Translations! Initiation codons 

Eukaryotic translation 1s preferentially Initiated at the first AUG 
of an mRNA; and A or G at position -3 and G at position +4 may facilitate 
recognition of functional AUG codons. This §XXAUGG consensus 1s referred 
5 to as the ribosome binding site (M. Kozak (1981) Nucl. Adds Res. 9^:5233- 
5252; M. Kozak (1983) Cell 34/.971-978) . The number of amino adds and 
calculated molecular weights for the putative pRi T L -DNA protein products 
(Table 3) were derived by assigning the first 1n-phase AUG codon as the 
initiator codon. The art has not ruled out use of secondary AUG codons as 
10 translation Initiation codons (M. Kozak (1983) Microbiol. Rev. 47:1-45). 

Initiator codon DNA sequences are listed in Table 3 below the con- 
sensus eukaryotic ribosome binding site. Eight of the eighteen ORFs had 
first AUG codons which conform with this consensus sequence (ORFs 1, 7, 8, 
10, 11, 12, 14, and 18). Of the ten remaining ORFs, four had downstream, 
15 i n- phase AUG codons which conformed with the consensus sequence: ORF 2, 
287 bp downstream; ORF 3, 160 bp; ORF 6, 344 bp; ORF 13, 203 bp; and " 
ORF 17, 105 bp (see Fig. 2). The remaining six ORFs (2, 4, 5, 9, 15, and 
16) did not have any AUG codons which conform to the consensus sequence 
followed by 300 bp in-phase ORFs. The presence of a consensus ribosome 
20 binding AUG codon is not necessary for translation initiation of .T-DNA 

mRNAs; four abundantly transcribed octopine pTi T L -DNA genes are initiated 
at AUG codons which do not conform to the consensus sequences. 

Several pTi T-DNA ORFs are actively transcribed in _E. colj_ mini- 
cells (G. Schroder et±L. (1983) EMBO J. 2_:403-409). Translational 
25 initiation in E_. coll and most prokaryotes generally start at an AUG codon 
that is proceeded by a G-rich ribosome binding site (J. Shine and 
L. Dalgarno (1974) Proc. Natl. Acad. Sci. USA J7J_: 1342-1346) . Sequences 
which may function as prokaryotic ribosome binding sites were observed 
ahead of the pRi \-DNA ORF 4, 5, 7, 9, and 18 initiation codons. 

30 

1.6 Codon usage 

Most pRi T L -DNA ORFs were observed to fit pTi T L -DNA codon 
preference patterns, thereby indicating that they are functional after 
integration into a plant genome, notable exceptions being ORFs 7 and 18. 
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1.7 Locations of transcription controlling sequences 

Comparisons of nucleotide sequences from the 5'-flank1ng regions of 
...L,.., n f<r nonoe Vi»vp «-pvp»1*>H consensus locations and seouences of 

uiaiijr cu*oijwv<w jjvn»-«» * ■ — ■ — - - - — - — - - ■■- 

several DNA elements which may be Important 1n regulating RNA poly- 
5 merase II-med1ated transcription (S. L. McKnlght and R. Kingsbury (1982) 

Science 217:316-324). These characteristic eukaryotic promoter elements 
are the "TATA-element", located 25-30 bp upstream (5') from the start of 
transcription, and the "CCAAT-element" . located 40-50 nucleotides upstream 
from the TATA-element (C. Benoist et_ aT. (1980) Nucl. Acids. Res. 8.:127- 
1C 142; A. Efstratiades et a1_. (1980) Cell 21_:6S3-668). Similar promoter 

elements have been found in the 5'-flanking regions of many plant and 
pTi-T-DNA genes; pTil5955 T-DNA (Barker _et al_. , supra ) and pTiAch5 T|_-DNA 
(Gielen et al_. , supra) have sequences resembling these TATA and CCAAT 
promoter elements located in the 5'-Hanking regions of eight T L -DNA and 
15 six T R -DNA ORFs (i.e. have "eukaryotic-looking" promoters) All eight 

eukaryotic-looking pTi T L -DNA ORFs are transcribed and at least five of 
six eukaryotic-looking pTi T R -DNA ORFs are known to be transcribed. 

The presence of TATA and CCAAT promoter elements in 5' -flanking 
regions of pRi T L -DNA ORFs indicated that a particular ORF was part of a 
20 functional gene. Most pRi T L -DNA ORFs (16 of 18) were flanked by 

sequences (Table 3) that closely resembled these eukaryotic promoter ele- 
ments. The amount of sequence identity between the promoter elements. and 
the consensus sequences was very high; ORFs 2 and 12 had promoter elements 
which matched the consensus sequences while the promoter elements from the 
25 other thirteen ORFs did not vary by more than three mismatches. These 

results were consistent with the degree of homology found for promoter 
elements from pTi T-DNA ORFs (Barker et. al_. , supra ; Gielen et a1_. , supra). 

pRi T L -DNA open reading frames 1, 4, 8, 10, 13, 14, and 17 were 
flanked by multiple promoter elements. ORFs 7 and 18 were not flanked by 
30 sequences resembling eukaryotic promoter elements and were not expected to 

be transcribed in plant tissues. ORFs 4, 5, 7, and 9 overlapped ORFs 5, 
6, and 8 on the opposite strand (Fig. 1, Table 2); the larger ORFs (5, 6, 
and 8) were more likely to be transcribed because DNA encoding over- 
lapping, antiparallel ORFs in pTi T-DNA was found to be transcribed from 
35 either one strand or the other (Gielen et al_. , supra ). 
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Comparison of polyadenylation sites present 1n the 3'-noncoding 
regions of plant genes Indicates a preference for the hexanucleotide, 
AATAAA (J. Messing et _aK (1983) In Genetic Engineering of Plants , ed. : 
A. Hollaender, pp. 211-227), however, variations have been observed for 
5 plant genes, e.g. AATAAG and GAT AAA. Many pTi T-DNA ORFs are also 

followed by AATAAA sequences. The remaining pTi T-DNA ORFs are followed 
by polyadenylation sites which vary only slightly, e.g. AATAAT, TAT AAA, or 
AATGAA; AATAAT is known to function for the ocs gene (H. DeGreve et al. 

(1982) 0. MoT. Appl. Genet. 499-511). 

10 Presumptive pRi T L -DNA polyadenylation sites and their locations are 

listed in Table 3, Ten ORFs (2, 4, 6, 8, 9, 11, 12, 13, 14, and 15) had 
the consensus hexanucleotide, AATAAA, near their 3' -ends, whereas only two 
(ORFs 7 and 18) did not contain any related sequence (Table 3, Fig. 2). 
The remaining ORFs (1, 3, 10, and 16) had polyadenylation sites closely 

15 related to those described above. ORFs 8, 10, 12, 13, and 14 were 

followed by multiple polyadenylation signals. Multiple polyadenylation 
sites have also been observed in several pTi T-DNA genes (P. Dhaese et al . 

(1983) EMBO J. 2/.419-426; Gi el en _et , supra ). 

2 0 l .8 ORF locations with respect to base composition 

The G+C content of the large Agrobacterium plasmids is about 59% 
(S. Sheikholeslam et jiK (1979) Phytopathol . 69_:54-58) . In contrast , 
pRi T L -DNA had very A+T-rich regions flanking the eukaryotic ORFs while 
coding regions had G+C contents in the range of 50%. Plant genes can also 

25 have A+T-rich flanking sequences. 

1.9 Detection of transcripts 

The T^-DNA -left junction with plant DNA found in an _A. rhizogenes 
transformed tobacco tissue, clone 9, was between the position 6361 Hind i II 

30 site and the position 7535 EcoR I site, while the right border was to the 
right of the position 19,918 Kpn l site (see Example 1.3). Hybridization 
of nick-translated pRi T^-DNA probes to membrane filter-bound replicas of 
the gels ("Northern blots") clearly showed transcripts carrying ORFs 8 and 
13. An observed transcript of about 950 nucleotides which hybridized with 

35 pRi T L -DNA between EcoR I sites at positions 9077 and 13,445 was assigned 
to ORF 11. An observed transcript of about 1400 nucleotides which hybri- 
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dized with sequences spanning the position 17,059 EcoRI site was assigned 
to ORF 15. An observed transcript of about 800 nucleotides which hybri- 
dized with pR1 T L -DNA between the positions 9077 and 13,445 EcoRI sites 

was assigned to ORF 12. 

5 The relative abundances of pRi \-DNA transcripts 1n clone 9-derlved 

plants were observed to be a function of organ (leaves vs. roots) and 
phenotype (T vs. V ; see Tepfer (1984) supra ). With the exception of the 
transcript corresponding to ORF 12, P R1 \-DNA transcripts were more pre- 
valent in roots than 1n leaves, with a particularly striking case being 

10 the mRNA assigned to ORF 15. Expression of the transcript assigned to 
ORF 12 was leaf specific and was correlated with the V phenotype. 

RNA from C. arvensis tissue transformed by pRi T L -DNA which included 
sequences encoding ORFs 1-6 also hybridized with pRi T L -DNA. 

15 i.io Conclusions 

The data discussed above (Examples 1.2, 1.4-1.8) indicated that of 

the ORFs flanked by eukaryotic transcription controlling sequences 

(ORFs 1, 2, 3, 4, 5, 6, 8. 9. 10, 11. 12. 13. 14. 15. 16. 17). ORFs 1. 2. 

3 6, 8, 11, 12, 13, I 4 . 1S» and 16 were most 1ikel -y t0 be transcribed. 
20 in tobacco tissue transformed by DNA encoding ORFs 8-18, transcription of 

DNA region encoding ORFs 8, 11. 12. 13. and 15 has been detected 

(Example 1.9). 

Example 2 

25 T his Example discloses materials and methods used to obtain the 

results disclosed 1n Example 1. 

2.1 . Materials 

Restriction endonucleases Aval, BamHl , BsJ_II , EcoRI, Hindlll, Kp_nl, 
30 PsU. Pvull, Sail. StuI, Xbal, and Xhol were obtained from Promega- 
bToUc Enzymes Ac_c_l , Cl_al , Dra_I , Jtetl » * » JiiLl » il££^ • > and 
X 0r ii were obtained from New England Biolabs. Polynucleotide kinase was 
Trom P-L Biochemicals and bovine alkaline phosphatase was from Boehringer- 
Mannheim. [t- 32 P] ATP (2000-3000 Ci/mmole) was obtained from New England 
35 Nuclear. Chemicals used for DNA sequencing were obtained from the vendors 
recommended by A. K. Maxam and W. Gilbert (1980) Meth. Enzymol. 65:499- 
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560. X-ray film on rolls (20 cm x 25 m) XAR-351 was obtained from 
Kodak. DuPont Quanta III Intensifying screens (35 cm x 1 m) were cut 1n 
half to fit sequencing gels (17.5 cm x 1 m). DNA sequencing gel stands, 
designed for gels measuring 20 cm x 104 cm, and safety cabinets were from 
Fotodyne Inc., New Berlin, Wisconsin. Water jacket thermostatlng plates 
were constructed using V4 inch thick plate glass glued together by 100% 
silicone rubber. 

2.2 DNA isolation 

Procedures for the isolation and mapping of plasmid and cosmid sub- 
clones of the closely-related R1 plasmids pRiA4 and pRiHRI have been 
published: A4 subclones: EcoR I e36 (EcoR I 3a), BamH I 8a, el6 (contains 
Ri EcoR I fragments 15, 36, and 37a) by F. Leach (1983) Ph.D. Thesis, 
Universite de Paris-Sud, Centre d'Orsay; and pRiHRI subclones: pl_J40 
(i.e. cosmid 40) and EcoR I 3b by L. Jouanin (1984) Plasmid 12:81-102. 
Plasmid DNAs were prepared as described by H. C. Birnboim and J. Doly 
(1979) Nucl. Acids Res. J/.1513-1523 , followed by two CsCl , ethidium 
bromide gradient bandings. 

2.3 DNA sequencing 

DNA sequences were determined using the chemical method, essentially 
as described by Maxam and Gilbert, supra . Generally, 10-20 pg of plasmid 
DNA was digested with the appropriate restriction enzyme, followed by 
removal of the 5' terminal phosphate with 2-3 units of calf intestinal 
alkaline phosphatase. Reactions were done in 100 mM Tris pH 8.4, 55°C for 
30 min. Both restriction enzyme and phosphatase were removed by two 
phenol and one chloroform extractions. DNA samples were then precipitated 
with ethanol, desalted with 70% ethanol , dried, and then resuspended in 
15 ill denaturation buffer (50 mM Tris-HCl (pH 9.5), 5 mM spermidine, and 
0.5 mM EDTA) and 15 wl H 2 0. End-labeling with [y- 32 P]ATP and isolation of 
end-labeled fragments were as described by Maxam and Gilbert, supra . Care 
was taken to avoid sequencing errors resulting from the presence of hydra- 
zi ne-unreacti ve 5-methycytosi ne bases, found after growth inj^. coli at 
the second cytosine base of EcoR I I or BstN I restriction enzyme sites 
(J. L. Slightom et al_. (1980) Cell 21:627-638). 
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Conditions for chemical reactions, at 20°C, were as follows: 1 ul 
dimethyl sulfate for 6, 30 sec.; 30 v l of formic acid 951 for A, 2.5 m1n.; 
30 ul of hydrazine 95* for C+T and C. 2.5 min. DNA samples were electro- 
-l—.«a ^A h n ..rc at 2500 V at constant voltage on gels 20 cm wide, 
5 104 cm long and 0.2 nm thick. Constant gel temperatures (50°C) were main- 
tained using a water-jacketed plate on one side of the gel sandwich. The 
opposite plate of the sandwich was treated with Y -methacryloxypropyl- 
trimethoxy silane (Sigma 6514) as described by H. Garoff and W. Ansorge 
(1980) Analyt. Biochem. U5:450-457, to bind the acrylamlde chemically to 
10 t he glass. Gel pouring, loading, and autoradiography have been described 
by R. F. Barker etai- (1983) Plant Mol . Biol. 2:335-350, and J. I. 
Slightom et al. (1983) Proc. Natl. Acad. Sci. USA 80 :1897-1901 . 

Computer programs for DNA sequence analysis were supplied by the 
University of Wisconsin Genetics Computer Group. 

15 

Example 3 

This Example teaches the manipulation of pRi T L -DNA TxCSs prepara- 
tory to insertion of a foreign, structural gene. 

20 3.1 Removal of Ndel sites from an H1 3-based vector 

These Examples extensively use ol igonucleotide-directed, site- 
specific mutageneiss of DNA (see Example 5.2). Although individuals 
skilled in the art may choose to use double-stranded DNA methods for such 
mutagenesis, as exemplified herein single-stranded methods are used. In 
25 general, single-stranded methods utilize M13-based vectors having inserted 
I. coli'lac gene sequences. Wild-type M13 contains three Ndel sites while 
The lac sequences contain no Ndel site; BamHI sites are absent from both 
M13 and lac. Removal of these Ndel sites, described below, by site- 
specific mutagenesis may prove essential when replacing a T-DNA structural 
30* ■ gene with a heterologous foreign structural gene (Example 6.1). M13-based 
vectors include nWB2341 and related vectors (W. M. Barnes et flU (1983) 
Meth. Enzymol. 101:98-122; W. K. Barnes and M. Bevan (1983) Nucl. Acids 
Res. H:349-368), and the M13mp-series of vectors (e.g. see J. Norrander 
et. al7~(1983) Gene 26 :101-106 , J. Messing and J. Vieira (1982) Gene 
35 19^:269-276) . mWB2341 and related vectors are linearized by digestion with 
EcoRl and Hindi II and the resultant sticky-ends are converted to blunt- 
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ends by incubation with the Klenow fragment of coli DNA polymerase I. 
Most of the M13mp-series vectors can be linearized by at least one blunt- 
end-forming restriction endonuclease (e.g. Sma l or Hi nc l I ) . In the alter- 
native, particular single-stranded DNA vectors may be preferred for some 
5 operations; other vectors may be substituted for those referred to above 
with minor modification of procedures described herein^ars will be under- 
stood by those of ordinary skill in the art. Also in the alternative, 
double-stranded DNA vectors might be substituted (see references cited in 
Example 5.2). 

1G Single-stranded DNA (ssDNA) of the viral form of an M13-based vector 

is isolated and subjected to oligonucleotide-directed site-specific muta- 
genesis, described in detail in Examples 3.3 and 5, after hybridization to 
5 1 CAATAGAAAATTCATAGGGTTTACC3' , 5 * CCTGTTTAGTATCATAGCGTTATAC3 ' , and 
S'CATGTCAATCATTTGTACCCCGGTTGS' , thereby removing three Ndel sites which 

15 will later prove to be inconvenient without changing the transl ational 
properties of the encoded proteins. A mutated M13-based vector lacking 
three Ndel sites is identified and designated ml3(Nde). 



3.2 Subcloning pRi T L -DNA into an M13-based vector 

20 DNA of a plasmid listed in Table 4, column 1 (e.g. pLJ40 for manipu- 

lations of the ORFs 11, 12, and 13 promoters and polyadenyl ation sites) ^ 
(see Example 2.2 for the sources of these plasmids) is isolated and diges- 
ted to completion with the restriction enzyme(s) listed in Table 4, 
column 2 (e.g. Sma l and Mst ll for ORFs 11, 12, and 13). DNAs of e36 and 

25 pLJ40 are respectively harbored by the deposited strains NRRL B-15958 and 
NRRL B-15957. (Alternatively, pRi A4 DNA or pRiHRI DNA may be isolated and 
digested with the enzyme(s) listed in Table 4, column 2.) 5* or 3'-pro- 
truding-ends are. then converted to blunt-ends by incubation with the 
Klenow fragment of E_. col i DNA polymerase I or T4 DNA polymerase, respec- 

30 tively, and all four deoxynucl eot ide triphosphates. The resulting mixture 
of DNA fragments separated by agarose gel electrophoresis and a fragment 
whose size is listed in Table 4, column 3 (e.g. 5.2 kbp for ORFs 11, 12, 
and 13) is eluted from the gel. 

Covalently-closed-circular DNA (cccDNA) of the replicative form (RF) 

35 of the K13-based vector ml3(Nde) is isolated, converted to a linear, 
blunt-ended DNA, and has its 5 ' -phosphates removed by incubation with 
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phosphatase. The resulting linearized vector 1s purified by gel electro- 
phoresis and 1s mixed with and Ugated to the T-DNA fragment isolated 
above. After transformation of the resulting mixture Into coll . viral 
DNAs and RFs are isolated from transformants and screened by restriction 
and hybridization analysis for the presence of inserts which when in 
single-stranded viral form, are complementary to the sequence as presented 
in Fig. 1 and which carry the complete DNA sequence of ORFs listed in 
Table 4, column 4. The virus which infects the selected colony is desig- 
nated as listed in Table 4, column 5 (e.g. mR4 for ORFs 11,. 12, and 13). 

3.3 Removal of endogenous Ndel and BamHI sites from pRi T L -DNA 

A vector designated as listed in Table 5, column 1 (e.g. mR4' for 
manipulations of the ORFs 11, 12, and 13 promoters and polyadenyl ation 
sites) is prepared from the vector listed in the corresponding line of 
Table 5, column 2 (e.g. mR4 for ORFs 11, 12, and 13) by primer extension 
after hybridization to the oligonucleotides listed in Table 5, column 3 
(e.g. S'GATTAGATAGTCAGATGAGCATGTGCS' , S'GCAAATCGGAGCCCCTCGAATAGGS 1 , 
5 'GCAATTTGGGAGCCATTGTGATGTGAGS' , and 5 1 CGGTTACGCGGAJGCCTATGCGGAGCGCC3 ' for 
ORFs 11, 12, and 13). This operation removes indigenous BamH I sites and 
Nde l sites, the sites designated in Table 5, column 4 being at pRi T L -DNA 
positions listed in column 5 (e.g. for ORFs 11, 12, and 13, an Ndel site 
at position 10,305 and BamH I sites at positions 11,198, 11,278, and 
12,816), which may be present which may prove inconvenient in later 
manipulations. (Note that there are no BamH I or Nde l sites in rnR5.) The 
sites may be removed one at a time by hybridization of a particular oligo- 
nucleotide to the ssDNA viral form of the vector listed in Table 5, 
column 2 (e.g. mR4 for ORFs 11, 12, and 13), incubation of the 
primer/viral DNA complex with the Klenow fragment of J_. coli DNA poly- 
merase I, all four deoxynucl eotide triphosphates, and DNA ligase, enrich- 
•ment of resulting cccDNA molecules, transformation into j\ coli selection 
of transformants, and isolation of RF followed by restriction enzyme 
analysis to identify a clone missing the undesired restriction sites. 
These steps are repeated for each site which is to be removed. Alterna- 
tively, the vector listed in Table 5, column 2 (e.g. mR4 for ORFs 11, 12, 
and 13) may be simultaneously hybridized to all of the oligonucleotides 
listed in Table 5, column 3 and then carried through the mutagenesis pro- 
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cedure thereby attempting, the procedure not being 100% efficient, to 
eliminate all of the sites in a single operation. 

3,4 Placement of novel Ndel and BamHI sites in pR1 T L -DNA 
5 A vector designated as listed 1n Table 6, column 1 (e.g. mORF 11 for 

manipulations of the ORF 11 promoter and polyadenyl ation site) is prepared 
from the vector listed in the corresponding line of Table 5, column 2 
(e.g. mR4' for ORF 11) by primer extention after hybridization to the 
oligonucleotides listed in Table 6, column 3 (e.g. 

- C 5 1 GCTGCGAAGGGATCCCTTT6TCGCC3 ' t 5 ' CGCAAGCTACAACATCATAJTGGGGCGG3 ' . 

5 1 GGGATCCATATGTGATGTGAGTTGG3 1 , 5 1 GCCT AAG AAGG AATGGTGG ATCCATGTACGTGC3 ' for 
ORF 11) as described above and in Example 5. This has the effect of 
introducing Nde l sites (5 1 . ♦ .CATATG. ♦ .3' ) at the translational start site 
( ATG) and near the translational stop site (TAA, TGA, or TAG), and of 

15 introducing BamH I sites (5 1 . . .GGATCC. ♦ .3' ) in the sequences flanking the 
T-DNA gene , usual ly approximately 0. 3 kbp from the transcri ptional start 
and polyadenyl ation sites. The first and fourth oligonucleotide of each 
quartet listed in Table 6, column 3 introduces BamH I sites while the*" 
second and thirds introduce Ndel sites. These sites are located in the 

20 corresponding pRi T L -DNA at the approximate position listed in Table 6, 
column 4. For example, for manipulation of ORF 11, 

5 1 GCTGCGAAGGGATXCCTTTGTCGCC3 1 a nd ' 5 1 GCCTAAGAAGG AATGGTGGATCC ATGTACGTGC3 1 
introduce BamH I sites and position 9,974 and 12,001, respectively, while 
5 1 CGCAAGCTACAACATCATAJGGGGCGG3 1 and 5 1 GGGATCCATATGTG ATGTGAGTTGG3 ' intro- 

- " duce Nde l sites at positions 10,679 and 11 ,286, respectively. The size 

and locations of the TxCS-carrying DNA segments used in these Examples may 
be calculated from the positions listed in Table 6, column 4 and the 
orientations defined in Table 2 and Fig. 1. Positions listed in Table 6, 
column 4, of pairs of Nde l and BamH I sites define promoter-bearing (?) and 

30 polyadenylation site-bearing (A) DNA segments as indicated by '^"s and 

"A^s, respectively, in column 5, the segments having approximate sizes as 
indicated in column 6. For example, the ORF 11 promoter ison an approxi- 
mately 715 bp DNA segment located between artificial Nde l and BamH I sites 
at approximate positions 11,286 and 12,001, respectively, while the ORF 11 

35 polyadenylation sites is on an approximately 705 bp DNA segment located 

between artificial BamH I and Nde l sites at approximate positions 9,974 and 
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10,679, respectively. Note that mORF12-13 and mORF16-17 provide examples 
of combinations of a promoter and a polyadenylatlon site from two 
different T-DNA genes. 

5 Example 4 

This Example teaches the manipulation of four exemplary foreign 
structural genes preparatory for insertion into a pRi T L -DNA TxCS. The 
genes are for the proteins phaseolin (a nutritionally important seed 
storage protein from Phaseolus vulgaris) , _F\ vulgaris lectin (a 
10 nutritionally important protein found in seeds and other plant tissues 
which may be involved in symbiotic nitrogen fixation and making seeds 
unpalitable to herbivores), thaumatin (a protein which tastes sweet to 
primates, naturally found in Thaumatococcus daniellii ). and crystal 
protein (a protein produced by Bacillus thuringiensis which is used 
15 commercially to control larval pests "of a large number of lepidopteran 
insect species). The crystal protein structural gene used here, though 
lacking its 3 1 end, encodes a protein toxic to insect larvae. Phaseolin, 
lectin, and thaumatin are eukaryotic genes; crystal protein is prokary- 
otic. Phaseolin contains introns; lectin and crystal protein do not. The 
20 lectin gene itself contains no introns and could be obtained on a 5.7 kbp 
Hindi II fragment from a genomic clone (L. Hoffman (1984) J. Mol. Appl. 
Genet- 2:447-453) which is part of a plasmid harbored by the deposited 
'strain NRRL B-15B21 (see also Example 6-4). However, in this Example the 
lectin structural gene is obtained from a cDNA clone (L. M. Hoffman et al . 
25 (1982) Nucl. Acids Res. _10 :7819-7828) , as is the thaumatin gene. 

4.1 Subcloning structural genes into M13 

The genes listed in Table 7, column 1 are carried by the plasmids 
listed in Table 7 t column 2, which may be isolated from the deposited 
30 stains listed in Table 7, column 3 (e.g, the crystal protein structural 
* gene is carried by pl23/5S-10 which is harbored within NRRL B-15612). DNA 
of a plasmid listed in Table 7, column 2 is digested to completion with 
the restriction enzyme(s) listed in the corresponding row of Table 6, 
column 4 and protruding ends are removed by incubation with the enzyme 
35 listed in Table 6, column 5 (e.g. for manipulation of the crystal protein 
structural gene, p!23/53-10 DNA is digested with Hindi 1 1 and the resulting 
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sticky-ends are removed by incubation with the Klenow fragment of col i 
DNA polymerase 1). A DNA fragment whose size is listed 1n Table 7, 
column 6 (e.g. 6.6 kbp for the crystal protein) is isolated by elution 
from an agarose gel after el ectrophoretic separation. The resulting frag- 
5 rr.ent is mixed with and ligated to dephosphoryl ated , blunt-ended, 

linearized ml3(Nde), prepared as described in Example 3.1, and is trans- 
formed into E. coli . Viral DNAs and RFs are isolated from transf ormants 
and screened by restriction and hybridization analyses for the presence of 
inserts which are complementary to the sequence when in single-stranded 
JO viral form as present in the mRKA. The vector which infects the selected 
colony is designated as listed in Table 7, column 7 (e.g. m3tCP for the 
crystal protein). 

4.2 Placement of Ndei sites flanking three structural genes 
:5 DNA of a vector listed in Table. 8, column 1 is used to prepare a 

vector designated as listed in Table 8, column 2 by primer extension after 
hybridization to the oligonucleotides listed in Table 8, column 3 (e'.g. 
for crystal protein, m3tCP is used to make mStCP' by extending the primers 
5 ' G G AGG T A ACAT AT G G AT A AC A AT C CG 3 1 and 5" GCGGCAGATTAACGTGTTCATATGCATTCGAG3 ' ) 
21 as described in Examples 3.3 and 5. This has the effect of introducing 

Ndel sites at the trans! at ional start site and near the trans! ational stop 
site; there are no BamH I or Nde l sites present within the structural gene 
which might otherwise be removed. In the case of the _B. thuri nciensis 
crystal protein gene, a trans! at ional stop codon (TAA) is additionally 
23 introduced. The structural genes listed in Table 7, column 1 may be iso- 
lated as a DNA fragment whose size is listed in Table 8, column 4 after 
digesting DNA of a vector listed in the corresponding line of Table 8, 
column 2 to completion with Ndel (e.g. the crystal protein structural gene 
is isolated from mS'tCP 1 on a 2.8 kbp Ndel fragment). 

30 

4 . 3 Mutagenesis of thaumatin 

Thaumatin cDKA-cont ai ni ng vectors have been disclosed by 
0. T. Verrips et_ _el_- , Eur. Pat. applications 54 ,330 and 54,331, and 
L. Eder.s _et _al_. (1982) Gene j_8:l-12. Thaumatin is originally synthesized 
3> as preprothaumatin, the prefix "pre" representing the presence of a 

"signal peptide" having the function of causing the export of thaur.atin 
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from the cytoplasm into the endoplasmic reticulum of the cell in which it 
is being synthesized, and the prefix "pro" representing that the protein 
is not in mature form. A thaumatin cDNA structural gene 1s present as the 
complement to thaumatin mRNA in M13-101-B (Eur. Pat. application 
5 54.3311. The viral form of this vector is used as a source of a thaumatin 
structural gene after site-specific mutagenesis directed by two of the 
following oligonucleotides: (a) 5'6GCATCATACATCAT_ATGGCCGCCACC3' , 

(b) 5 ' CCTCACGCTCTCCCGCGCMMi GCCACCTTCGAGATC6TCAACCGC3 ' » 

(c) 5' CGAGTAAGAGGATGAAGACGGACATATGAGGATACGC3' , or 

10 (d) B'GGGTCACTTTCTGCCCTACTGCCXAACATATCAAGACGACTAAGAGGS 1 . When mutated by 
oligonucleotides (a) and (c), which bind to the 5'- and 3'-ends of the 
structural gene, respectively, a preprothaumatin sequence is extracted 
from the resultant vector by Mel digestion. When mutated by oligonucleo- 
tides (b) and (d) , which bind to the 5'- and 3'-ends, respectively, a 

15 mature thaumatin sequence is similarly extracted. Use of the combinations 
of (a) with (d) and (b) with (c) yields fragments encoding what might be 
termed prethaumatin and prothaumstin, respectively. ATI of these 
sequences are obtained on fragments having a size of approximately 0.7 kbp 
having no internal Ndel or BanHI sites which may be isolated as usual by 

20 gel electrophoresis. 

4.4 Other possible manipulations 

Phased in and lectin, as initially translated have signal peptides _ 
at their amino-termini , as is the case with thaumatin. If desired, these 

25 signal peptides may be eliminated by placing the 5'-NdeI site between the 
codons forming the junction between the signal peptide and the mature 
protein. When under control of a T-DNA in a plant cell nucleus, such a 
structural gene will cause the synthesis of a phaseolin or lectin protein 
which is not exported from the cell's cytoplasm. Sequences useful for 

30 designing oligonucleotides for manipulating for phaseolin and lectin 

* structural genes are respectively reported by J. L. SI ightom et _al_. (1983) 
Proc. Natl. Acad. Sci. USA 80:1897-1901 , and Hof fman et al_. (1932) supra. 
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Example 5 

This Example describes techniques for the synthesis and use of syn- 
thetic oligonucleotides. Other useful references can be found 1n the list 
of works cited in the section introductory to these Examples. 

5 

5.1 Oligonucleotide synthesis 

Techniques for chemical synthesis of DNA utilize a number of tech- 
niques well known to those skilled in the art of DNA synthesis. Modifica- 
tion of nucleosides is described by H. Schal 1 er ^t ^1_. (1953) J. Amer. 

.1 Chem. Soc. 85.-.3B21-3827 , and H. Buchi and H. G. Khorana (1972) 0. Mol. 

Biol. 72:251-288. Preparation of deoxynucleoside phosphoramidites is 
described by S. L. Beaucage and M. H. Caruthers (1981) Tetrahedron Lett. 
22:1859-1862. Preparation of solid phase resin is described by S. P. 
Adams et .al_. (1983) J. Amer. Chem. Soc. 105:661-663- Hybridization pro- • 

L5 cedures useful during the formation of double-stranded molecules are , 

described by J. J. Rossi -et al_. (1982) J. Biol. Chem. _257_: 9226-9229. ; 

5.2 Oliqonucleotide-di rected site-specific mutagenesis 

General methods of directed mutagenesis have been reviewed by 
2D D. Shortle et_ al_. (1981) Ann. Rev. Genet. _15_:265-294. Of special utility 

in manipulation of genes is oligonucleotide-directed site-specific muta- 
genesis, reviewed recently by C. S. Craik (1985) Biotechniques 3;. 12-19; 
' M. J. Zoller and M. Smith (1983) Meth. Enzymol. 100:468-500; M. Smith and 
S. Gillam (1981) in Genetic Engineering; Principals and Methods , Vol. _3_, 
25 € ds.: 0. K. Setlow and A. Hollaender; and M. Smith (1982) Trends in 

Biochem. 7_:440-442. This technique permits the change of one or more base 
pairs in a DNA sequence or the introduction of small insertions or dele- 
tions. Recent examples of oligonucleotide-directed mutagenesis include 
W. Kramer et _al_. (1984) Nucl. Acids Res. 12:9441-9456; Zo-ller and Smith 
30 (1983) supra ; K. 0. Zoller and K. Smith (1982) Nucleic Acids Res. .10:6487- 

6500; G. Dalbadie-McFarland et a]_. (1982) Proc. Natl. Acad. Sci. USA 
79_:6409-6413; G. F. K. Simons et_ aj_. (1982) Nucleic Acids Res. 10:821-832; 
and C. A. Hutchison HI et al_. (1978) J. Biol. Chem. 253:6551-5560. 
Oligonucleotide-directed mutation using double-stranded DNA vectors is 
35 also possible (R. B. Wallace et al_. (1980) Science 209 :1396-1400; G. P. 

Vlasuk et al. (1983) J. Biol. Chem. 258:7141-7148; E. D. Lewis et al. 
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(19B3) Proc. Natl. Acad. Sc1. USA 80:7065-7069; Y. Morlnaga et al_. (1984) 
Blotechnol. Z/.636-639). See Example 3.1 for useful K13-based vectors. 

T" w «* PI T C 

5 This Example teaches use of the pRi T L -DNA TxCSs and the foreign 

structural genes manipulated in Example 3 and 4, respectively. Specific 
Examples of plant transformation vectors, plant transformation, and plant 
regeneration are given below in Examples 6.4-6.7. 

10 6.1 Assembly of TxCS/st ructural gene combinations 

A plasmid listed in Table 6, column 1 (e.g. mORF 11) is digested 
with Ndel and dephosphorylated with phosphatase, and the opened vector may 
be separated from the T-DNA structural gene found nested within the 
TxCS. A plasmid listed in Table 8, column 2 is digested with Nde l and the 

15 corresponding structural gene listed in Table 7, column 1 is isolated as a 
fragment whose size is listed in Table 8, column 4 by agarose gel electro- 
phoresis followed by elution from the gel (e.g. crystal protein structural 
gene is isolated from mBtCP' on a 2-8 kbp Ndel fragment). Additionally, a 
thaumatin-encoding fragment may be isolated as described in Example 4.3. 

20 Any desired combination of an opened TxCS vector, and an isolated foreign 
structural gene may now be mixed with each other and li gated together. 
For example, crystal protein structural gene may be placed between an 
' ORF 11 promoter and an ORF 11 polyadenyl ati on site, thereby replacing the 
structural gene of ORF 11 with that of the crystal protein, by ligating 

25 the 2.8 kbp Ndel fragment of m3tCP f into Nde l-digested mORF 11 DNA. The 
ligation mixtures are individually transformed into £. col i and RFs are 
isolated from the resultant transformants and characterized by restriction 
analysis. A colony is chosen for each transformation which lacks the 
endogenous pRi T^-DNA structural gene and has a single copy of the hetero- 

30 logous foreign structural gene inserted within the TxCS, the structural 
gene and the TxCS being in such orientation with respect to each other 
that the gene is expressible under control of the TxCS when within a plant 
cell. 
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5 2 Assembly of plant transformation vectors 

A TxCS/heterol ogous foreign structural gene combination may be 
removed from the M13-based vector constructed in Example 6.1 by digestion 
with BamH I followed by agarose gel electrophoresis and elution. The size 
5 of the BamHI-fragment bearing the promoter/structural gen/polyadenylation 
site may be calculated by adding the size of the structural gene-bearing 
fragment, as listed in Table 8, column 4, to the sizes of the promoter and 
polyadenylation site-bearing segments, as listed in Table 6, column 6. 
For example, an ORF 11 TxCS/crystal protein structural gene combination, 
10 as exemplified herein, may be obtained on a 4.2 kbp BamH I fragment 

(2.8 kbp + 715 bp + 705 bp). A TxCS/gene combination may be inserted 
directly into a 5'GATC...3' sticky-ended site, which may be generated by 
BamH I , Bell, Bglll, Kbo l, or Seu3AI. Alternatively, the combination may 
be inserted into any desired restriction site by conversion of sticky-ends 
15 into blunt-ends followed by bl unt-end 1 i gat i on or by use of appropriate 

oligonucleotide linkers. ; 
'"" An alternative to assembly of a pRi T L -DKA TxCS/st ructural gene 

combination followed by insertion of that combination into a plant trans- ^ 
formation vector is the insertion of a pRi TxCS into a plant transforma- 
20 tion vector followed by insertion of the structural gene into the 

TxCS/transformation vector combination. It is advantageous that the plant, 
transformation vector not contain Wdel sites if the particular manipula- 
tion strategy exemplified herein is to be used. Otherwise TxCS/vector 
combination may be linearized by partial Nde l digestion, as will be under- 
25 stood in the art. 

6.3 Vector choice, transformation and plant regeneration 

The plant transformation vector into which the TxCS/gene combination 
is to be inserted may be a TIP-based system such as a TIP plasmid, a 

30 shuttle vector for introduction of novel DMAs into TIP plasmids, or a sub-„ .. 
TIP plasmid, e.g. mini-Ti or micro-Ti. Alternatively, a vector based upon 
a DKA virus, mini chromosome, transposon, and homologous or nonhomologous 
recombination into plant chromosomes may be utilized. Any mode of 
delivery into the plant cell which is to be initially transformed may be 

35 used which is appropriate to the particular plant transformation vector 
into which the TxCS/structural gene combination is inserted. These forms 
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of delivery Include transfer from a Agrobacterium cell . fusion with 
vector-containing liposomes or bacterial spheroplasts, direct uptake of 
nucleic acid, encapsidation 1n viral coat protein followed by an infec- 
tion-like process, or microinjection. 

— -■-"-■«•»»-" plum, v-ciia oic piupaydtea ana usea to 

produce plant tissue and whole plants by any means known to the art which 
is appropriate for the plant transformation vector and delivery mode being 
used. Methods appropriate for TIP-based transformation systems include 
those described by M.-D. Chi lton et (1982) Nature 195:432-434, for 
10 carrots, K. A. Barton et al_. (1983) Cell 32:1033-1043, for tobacco. 

Selection of transformed cells may be done with the drugs and selectable 
markers as described in the Background. The exact drug, concentration, 
plant tissue, plant species and cultivar must be carefully matched and 
chosen for ability to regenerate and efficient selection. Screening of 
transformed tissues for tissues expressing the foreign structural gene may 
be done using immunoassays known to the art. Southern, northern, and dot 
blots, all methods well known to those skilled in the art of molecular 
biology, may be used to detect incorporated or expressed nucleic acids. 
Screening for opine production is also often useful. 



15 



20 



25 



30 



35 



6«4 Preparation of a disarmed T-DNA vector 

E. coJM C600 (pRK-203-Kah-103-Lec), which is on deposit as NRRL 
B- 15821 , is a pRK290 derivative containing T-DNA sequences of pTil5955 
from between EcoR I sites at positions 4,494 and 12,823, as defined by 
R. F. Barker et _a]_. (1983) Plant Mol. Biol. 2_:335-350, except for a dele- 
tion of sequences between position 5,512 Hindi 1 1 site and position 9,062 
BamHI site. Inserted into the deletion, i.e. substituting for the deleted 
T-DNA, is a Tn5-derived kanamycin resistance (kan) gene and a Phaseolus 
vulgaris seed lectin gene (see Example 4, Hoffman (1984) supra .). The 
^ lectin gene may be deleted from pRK-203-Kan-103-Lec by digestion with 
.Hindi 1 1 followed by reliostion; the resultant vector is designated 
pRK-203-Kan-103. BemHI-di gested , dephosphoryl eted pRK-203-Kan-103 is 
mixed with and li gated to a BamH I fragment bearing the pRi T L -DNA 
TxCS/heterologous foreign structural gene combination assembled in 
Example 6.2; the resultant vector is designated pRK-203-Ri-Kan-103. 
pRK-203-Ri-Kan-103 is introduced in_A. tumefaciens ATCC15955 using methods 



BN8DOCID: <£P__J0BM69aA3Jj» 



- 50 - 



0204590 



well known in the art, and a double-homologous recombinant, designated 
RS-Ri-Kan. is identified. RS-Ri-Kan does not harbor pRK-203-Ri-Kan-103, 
but contains a mutated pTil 5S55 having a T-DNA substitution between the 
positions 5,512 Hindlll site and 9.052 and BamHI site of a TxCS/structural 
gene combination and a kan gene for pTi T-DNA. This substitution deletes 
some tmr and tms_ sequences, thereby disarming the T-DNA. RS-Ri-Kan T-DNA 
transforms inoculated plant tissue without conferring the phenotype of 
hormone-independent growth. Tobacco tissues transformed by RS-Ri-Kan may 
be regenerated into normal plants using protocols well known in the art 
for regeneration of untransformed tissue. 

6.5 Construction of a micro-Ti plasmid 

pl02, a pBR322 clone of the pTil5955 T-DNA fragment between Hindi 1 1 
sites at positions 602 and 3,390 (as defined by R. F. Barker et j|l_. , supra 
carries the left border of T L end promoter sequences associated with 
ORF 1. .p.233 Is a p_BR322 clone of the pTil5955 T-DNA BamH I/EcoRI fragment 
spanning positions 9,062 and 16,202. The T-DNA of p233. includes 
Sma l/Bcl I fragment spanning positions 11,207 and 14,711, having ocs , a 
3'-deleted tml , and the right border of T L . p233 was linearized with 
Srr.s l , mixed with and ligated to a commercially available blunt-end Bgll l 
linker, trimmed with Bgl ll, religated to itself, end transformed into 
E. col i 6K33 (a dam" host that does not methylate DNA in a manner incom- 
patible with the action of Bell, M. G. Karinus and N. R. Morris (1974) 
0. Mol. Biol. 85_:309-322). A colony was identified which harbored a plas- 
mid, designated p233G, having a Bgl ll site in the location formerly 
occupied by the position 11,207 Smal site. p233G DNA was digested with 
Bgl II and Bel l and a 3-5 kbp fragment was isolat-ed by agarose gel electro- 
phoresis followed by elution. The 3.5 kbp Bglll /Bcll fragment was mixed 
■with and ligated to" Bel I I-di oested , phosphatase-treated pl02 DMA. The 
ligation mixture was transformed into_E_. coli K802 (W. B. Wood (1966) 
J. KdI. Biol. J_£: 118) . Plasmid DNAs from ampi ci 1 1 i n- res i stent transfor- 
ms nts were characterized by restriction analysis and a colony was identi- 
fied, designated pAK-4 , having the Eg! I I/ Bcl I fragment of p233G inserted 
into the Bel li site of pl02 and oriented so that the ocs gene was located 
between the left end right T L borders. One Bgl 1 1 site, also between the 
borders, was regenerated, and a Bgl II /Bcl I suture, not susceptable to the 
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action of either enzyme, was generate to the right of the right border. 
pAK-4 may be represented as follows: 

pBR322...HindIII...left border.. .B^lII.. .o«.. .right border... 



The T-DNA of pAK-4 may be removed on a 6 kbp Hindi 1 1 fragment. 
Hindi 1 1 -digested pAK-4 DNA was mixed with and ligated to HindHI- 
linearized, phosphatase-treated pSUP106 DNA. pSUP106, a 10 kbp wide host- 
range plasmid capable of maintenance in both E_. coli and Agrobacterium 

10 (R. Simon _et_al_. (1983) in Molecular Genetics of the Bacteria-P1a.nt -Int er- 
action , ed.: A. Punier, pp. 98-105), is harbored by _E . coli CSH52 
(pS'JP106) which is on deposit as NRRL B-15486. The reaction mixture was 
transformed into K802 and plasmid DNAs from chloramphenicol-resistant 
transformants were characterized by restriction analysis. A colony was 

15 identified harboring a plasmid, designated P AN6, having the Agrobacterium 
DNA of pAK-4 inserted into the Hindi j I site of pSUP106 oriented so that 
Bcl_lI/_BcT_I suture was proximal to the pSUPlOS EcoRI site. pAN6 is a 
micro-Ti plasmid having within its two T-DNA borders a functional ocs gene 
and a j}gJ_II site that is unique to the plasmid. The Bgl ll site is flanked 

20 by an incomplete tml gene and the pTi ORF 1 promoter, both of which are 
transcribed towards the Bgl II site. 

BamHI-digested, dephosphoryl ated pAN5 is mixed with and ligated to a 
■BamHI fragment bearing the pRi T^-DNA TxCS/heterol ogous foreign structural 
gene combination assembled in Example 6.2; the resultant vector is desig- 

25 nated pAN6-Ri . pAN5-Ri may be introduced into an Agrobacterium strain 
having a helper plasmid, e.g. LBA4404 (G. Ooms _et_ _al_. (1981) Gene 14:33- 
50), using methods well known in the art. 



6.6 Inoculation of tobacco stems 

Stems of sterile Nicotiana tabacum var. Xenthi ere cut into segments 
approximately j cro i ong . These segments are placed basal end up in Petri 
dishes containing Kurashice and Skooc medium (MS medium: 1.65 g/l NH 4 N0 3 , 
1.5 9/1 KN0 3 , 440 mg/1 CaCl 2 -2H 2 0, 370 mg/1 KcS0 4 -7H 2 0, 1.70 mg/1 KH 2 P0 4 > l' 
0.83 mg/1 Kl, 6.2 mg/1 H 3 B0 3 , 22.3 mg/1 KnS0 4 - 4H 2 0, 8.6 mg/1 ZnS0 4 .?H 2 0,' ' 
0.25 mg/1 Na 2 Mc0 4 - 2H 2 0, 0.025 mg/1 CuS0 4 -5H 2 0, 0.025 mg/1 CoCl 2 -6H 2 0, 
37.23 mg/1 Na 2 EDTA , 27.85 mo/1 FeS0 4 -7H 2 0, 1 g/l inositol, 50 mg/1 
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nicotinic acid, 50 mg/1 pyroxidine- HCl . 50 mg/1 thiamine- HCl , 30 g/1 
sucrose, and 8 g/1 agar, pH 5.8) without hormonal supplement, a medium 
well known in the art. The basal (upper) ends are then inoculated with 
Agrobacterium cells by puncturing the cut surface of the stem with a 
5 syringe needle. After two weeks of incubation at 28°C with 16 hr light 
and 8 hr dark, cal-li develop at the upper surface of all stem segments. 
The callus regions are then transferred to MS medium containing 2.0 mg/1 
NAA (1-naphthalene acetic acid), 0-3 mg/1 kinetin and 0.5 mg/ml carbini- 
cillin. After two weeks on this medium, the tissues are free of bacteria 
10 and can be assayed for the presence of opines, a methodology well known in 
the art. 

Once free of inciting bacteria, the transformed plant tissues are 
grown on MS medium with hormones et 25°C with 16 hr light and 8 hr dark. 
These tissues are cloned using a suspension method described by A. N. 

15 Binns and F. Meins (1979) Planta 145:365-369. Briefly, tissues are placed 
in liquid MS medium supplemented with 2.0 mg/1 NAA and 0.1 mg/1 kinetin, 
and shaken at 135 rpm at 28°C for 10-14 days. The resultant suspensions 
are filtered successively through 0.543 and 0-213 mm mesh sieves, concen- 
trated, and plated at a final density of 8 x 10 3 cells/ml in MS medium 

20 supplemented with 2.0 mg/1 NAA and 0.3 mg/1 kinetin. After these grow to 
approximately 100 mg, colonies are split into two pieces. One piece is 
placed on complete MS medium and the other is screened for the presence of 
* opines. Approximately 0-50J of the colonies are found to be opine- 
positive, depending on the particular parental uncloned callus piece from 

25 which the colonies were descended. Uncloned pieces having higher concen- 
trations of opine tended to yield a higher percentage of opine-positi ve 
clones. 

5.7 Regeneration of recombinant plants 

30 Tissues from various opi ne-posi ti ve clones are transferred onto MS 

medium supplemented with 0.3 mg/1 kinetin and cultured .at 28°C with 16 hr 
light and 8 hr dark. Shoots initiated are subsequently rooted by placing 
them in MS medium without hormones. Rooted plentlets are transferred to 
soil and placed at high humidity in a greenhouse. After 7-10 days, the 

35 plants are then grown with normal greenhouse conditions. Regenerated 
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plants derived from oplne-positi ve clones contain opines. The presence of 
opines indicates thereby that these normal looking plants are transformed 
by T-DNA. 
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Table 1 Restriction Enzyme Sites In pR1 T^-DNA Region 



No. 

Enzyme Sites Locations 



5 





Bst E II 
Sna I 
Apa I 
Kst II 


1 
1 
2 
2 


3 
6 
3 
4 


993 
459 
390 
806 


17 
15 


851 
021 












* 






10 


Sma I 
Xba 1 


2 
2 


3 


075 
676 


9 
4 


863 
999 




















Kpn I 


3 


3 


364 


14 


133 


19 


918 
















Mlu I 


3 


17 


606 


20 


793 


20 


856 
















Nco I 


3 


2 


262 


10 


133 


21 


021 














15 


Sst II 


3 


3 


431 


14 


691 


17 


037 
















Xho I 


3 


9 


242 


11 


003 


20 


700 
















Bam HI 


4 


1 


343 


11 


198 


11 


278 


12 


816 












Hpa 1 


4 


8 


375 


12 


459 


13 


700 


18 


818 












Mde I 


4 


3 


519 


3 


861 - 


4 


822 


10 


308 - 










20 


Nru I 


4 


5 


281 


10 


968 


11 


617 


18 


901 












Sal I 


4 


4 


515 


6 


047 


12 


655 


15 


821 












Ava III 


5 


13 


684 


14 


382 


15 


480 


16 


415 


18 


262 








BssH 11 


5 


5 


727 


6 


847 


19 


761 


20 


260 


20 


660 








BstX 1 


5 


2 


269 


4 


226 


9 


912 


16 


016 


• 18 


309 






25 


CI a 1 


5 




35 




753 


11 


421 


12 


598 


21 


110 








Nar I 


5 




465 


4 


114 


11 


356 


16 


441 


20 


385 








Nsi I 


5 


13 


688 


14 


386 


15 


484 


16 


419 


18 


266 








Sea I 


5 


1 


794 


4 


546 


10 


166 


11 


500 


13' 


858 








Tth III I 


5 


3 


413 


3 


816 


8 


217 


8 


769 


11 


369 






30 


Xma III 


5 


5 


814 


7 


970 


8 


502 


10 


613 


20 


347 








Aat 11 


6 




974 


5 


615 


6 


054 


7 


521 


9 


272 


19 


089 




• Asu II 


6 


4 


792 


10 


026 


12 


954 


16 


897 


19 


418 


19 


436 




Hind 111 


6 


5 


602 


6 


361 


9 


814 


11 


587 


15 


827 


17 


404 




Mst I 


6 


4 


004 


8 


091 


11 


427 


16 


088 


19 


690 


20 


408 


35 


Pst I 


6 


2 


244 


4 


892 


7 


003 


10 


486 


10 


533 


17 


780 




Xor 11 


6 




230 


2 


659 


4 


480 


5 


694 


8 


509 


16 


962 




Bel 1 


7 


19 


992 
827 


1 


364 


6 


710 


10 


564 


18 


673 


19 


403 


40 


Bgl II 


7 


4 


197 


5 


525 ' 


7 


879 


11 


239 


13 


097 


15 


517 






15 


760 
























EcoR I 


7 


7 
18 


585 
911 


9 


077 


13 


445 


15 


358 


17 


059 


18 


766 
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Table 1 continued 



5 Enzyme Sites Locations 



Acc I 


8 




333 


4 


516 


6 


048 


.6 


460 


9 


514 


12 


656 






15 


822 


19 


089 


















Bel I 


8 




497 


3 


568 


5 


488 


9 


233 


9 


339 


9 


916 






12 


001 


17 


544 


















Sph I 


8 




582 


11 


476 


15 


013 


15 


057 


15 


486 


17 


175 




19 


027 


20 


404 


















Xmm I 


8 


1 


759 


2 


725 


4 


498 


4 


546 


10 


103 


12 


206 






17 


338 


17 


917 


















EcoR V 


9 


5 


134 


6 


738 


7 


775 


10 


098 


10 


626 


13 


173 






14 


048 


16 


080 


17 


491 














Sst I 


9 


1 


967 


4 


152 


10 


879 


11 


068 


12 


395 


14 


105 






17 


016 


19 


214 


19 


866 














O l U 1 


q 


c 


Ron 






7 




j. i 


HHC 




UDD 


1 J 


y o / 






16 


656 


20 


186 


20 


467 














Bgl I 


10 


1 


571 


3 


125 


5 


872 


5 


956 


6 


832 


9 


775 






10 


912 


14 


290 


16 


606 


21 


065 










Ave I 


11 


3 


073 


3 


765 


5 


268 


7 


012 


9 


242 


9 


861 






10 


573 


10 


629 


11 


003 


14 


402 


. 20 


700 






Aha HI 


12 


2 


486 


11 


334 


12 


233 


13 


427 


13 


580 


13 


666 






15 


577 


15 


599 


16 


168 


18 


135 


18 


573 


20 


070 


Nae I 


13 




316 




446 


1 


664 


3 


931 


3 


962 


5 


733 






7 


616 


9 


771 


15 


000 


16 


622 


18 


474 


20 


380 






20 


652 






















Pvu II 


13 




250 


1 


235 


1 


859 


2 


395 


2 


752 


7 


888 






8 


451 


12 


042 


13 


715 


15 


590 


15 


620 


16 


056 






18 


688 
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CLAIMS 

We claim: 

5 1. A method of genetically modifying a plant cell comprising the step of 
transforming the cell to contain a pRi T-DNA promoter and a hetero- 
logous foreign structural gene, the promoter and the structural gene 
being in such position and orientation with respect to one another 
that the structural gene is expressible in a plant cell under control 
10 of the promoter. . 

2. A method according to claim 1, wherein the pRi T-DNA is hybridizable 

to pRiHRI T L -DNA 

3. A method according to claim 2, wherein the T-DNA promoter is from a 
gene selected from a group consisting of genes for ORFs 1, 2, 3, 4, 

15 5, 6, 8. 9, 10, 11, 12, 13, 14, 15, 16, and 17. 

4. "A method according to claim 3, wherein the T-DNA gene is selected ... . 
from a group consisting of genes for ORFs 1, 2, 3,- 6, 8, 11, 12, 13, 
14, 15, and 16. 

5. A method according to claim 4, wherein the T-DNA gene is selected 
20 from a group consisting of genes for ORFs 8, 11, 12, 13, and 15. 

6. A method according to claim 2, wherein the T-DNA gene is from pRiHRI 
T-DNA, pRiA4 T-DNA, or a T-DNA essentially identical thereto. 

7. A method according to claim 1, wherein the cell is additionally 
transformed to contain a pRi T L -DNA transcript terminator, the promo- 

25 ter, the structural gene, and the transcript terminator being in such 

position and orientation with respect to one another that transcrip- 
tional termination of the structural gene in a plant cell is under 
control of the transcript terminator. 

8. A method according to claim 1, wherein the promoter or the structural 
30 gene comprises an insertion, deletion, or subsitution of one or more 

nucleotide pairs. 

9. A method according to claim 1, wherein the structural gene changes a 
phenotype of a plant or plant cell when expressed therein. 

10. A method according to claim 9, wherein the structural gene encodes an 
35 insecticidal toxin identical to or derived from the crystal protein 

of Bacillus thuringiensis . 
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11. A method according to claim 9, wherein the structural gene 1s hybrl- 
dlzable to a phaseolln gene, 

12. A method according to claim 9, wherein the structural gene encodes 
thaumatin or a precursor of thaumatin. 

5 13. A method according to claim 9, wherein the structural gene encodes a 
legume lectin. 

14. A method according to claim 1, comprising the step of integrating the 
promoter/structural gene combination into a pi ant chromosome/ whereby 
the combination is flanked by plant DNA. 
10 15. A plant, plant cell, or plant tissue, or plant seed derived or 
descended from a genetically modified plant cell produced by the 
method of claim 14. 

16. A plant, plant cell, or plant tissue, or plant seed derived or 
descended from a genetically .modified plant cell produced by the 

15 method of claim 1. 

17. A DNA molecule comprising a pRi T-DNA promoter and a heterologous 
foreign structural gene, the promoter and the structural gene being 
in such position and orientation with respect to one another that the 
structural gene is expressible in a plant cell under control of the 

20 promoter. 

18. A DNA according to claim 17, wherein the pRi T-DNA is hybridizable to 
pRiHRI T L -DNA. 

19. A DNA according to claim 18, wherein the T-DNA promoter is from a 
gene selected from a group consisting of genes for ORFs 1, 2, 3, 4, 

25 5, 6, 8, 9, 10, 11, 12, 13, 14, 15. 16, and 17. 

20. A DNA according to claim 19, wherein the T-DNA gene is selected from 
a group consisting of genes for ORFs 1, 2, 3, 6, 8, 11, 12, 13, 14, 
15, and 16. 

21. A DNA according to claim 20, wherein the T-DNA gene is select-ed from 
a group consisting of genes for ORFs 8, 11, 12, 13, and 15. 

22. A DNA according to claim 18, herein the T-DNA gene is from pRiHRI 
T-DNA, pRi A4 T-DNA, or a T-DNA essentially identical thereto. 

23. A DNA molecule according to claim 17, further comprising a pRi T L -DNA 
transcript terminator, the promoter, the structural gene and the 
transcript terminator being in such position and orientation with 
respect to one another that transcriptional termination of the 



30 



35 
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structural gene In a plant cell Is under control of the transcript 
terminator. 

24. A DNA according to claim 17, wherein the promoter or the structural 
gene comprises an Insertion, deletion, or substitution of one or more 

5 nucleotides. j 

25. A DNA according to claim 17, wherein the structural g&fe changes a 
phenotype of a plant or plant cell when expressed therein. 

26. A DNA according to claim 25, wherein the structural gene encodes an 
insecticidal toxin identical to or derived from the crystal protein 

l o of Bacillus thurinqiensis . 

27. A DNA according to claim 25, wherein the structural gene is hybridi- 
zable to a phaseolin gene. 

28. A DNA according to claim 25, wherein the structural gene encodes 
thaumatin or a precursor of thaumatin. 

15 29. A DNA according to claim 25, wherein the structural gene encodes a 
legume lectin. 

30. A DNA according to claim 17, wherein the DNA is contained within a 
bacterium. 

31. A DNA according to claim 30, wherein the bacterium is coli or is 
20 of the genus Aqrobacterium . 

32. A DNA according to claim 17, wherein the DNA is within a plant cell. 

33. A DNA according to claim 32, wherein the plant cell is within a 
plant, a plant tissue, or a plant seed. 

34. A DNA according to claim 17, wherein the promoter/structural gene 
2 5 combination is flanked by plant DNA. 

35. A DNA according to claim 34, wherein the DNA is within a plant cell, 
a plant tissue, a plant, or a plant seed. 

36. A DNA according to claim 17, wherein the DNA is within a plant cell, 
a plant tissue, a plant, or a plant seed. 

30 37. A DNA molecule comprising a heterologous foreign structural gene and 
a pRi T L -DNA transcript terminator, the structural gene and the tran- 
script terminator being in such position and orientation with respect 
to one another that transcriptional termination of the structural 
gene in a plant cell is under control of the transcript terminator. 
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38. A DNA according to claim 37, wherein the transcript terminator 1s 
derived from a gene selected from a group consisting of genes for 
ORFs 1, 2, 3, 6, 8, 11, 12, 13, 14, 15, and 16. 

39. A DNA according to claim 38, wherein the T-DNA gene 1s selected from 
a group consisting of genes for ORFs 8, 11, 12, 13, and 15. 

40. A DNA according to claim 37, wherein the DNA is within a plant cell, 
a plant tissue, a plant, or a plant seed. 
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GGCCGCAGGATTTCGTTC6TCGTGCGTGAT6AGATCGATAAATGTTTATCGACGAGGACA 60 

AGATCGACGATGCGGTTCTTGCGCTGTTGTAGTGACGCTCCACAACGAGTGTTGCGCCGT 120 

GAAAGGCTTTGACTGGGCCGCGACGGACCGCCTTTGCAGGAAGGGTTCGGTCGGCGATCC 180 

CGTCAATAAATCGAAGCTATTGATCCTGACGGATAAAGGTCTGCGTCGATCGGAGGAGCT 240 

ATTCCGACAGCTGTTTACGCGCTAGCCATTGGCCGACGGTCTTTGCGCCCTCCATTCCCA 300 

CGGCGTAGTTAATGCCGGCGGGGACGGGAGTGTCTACTATGTGCAAGCACGTCGGCGAAC 360 

CATGCCTTCGGATTAATGTCGTTCAGACGGGCGGTCGTAAGTTGAATGAGTATGACTGCC 420 

GCATGGTCAGCGCCGCGTTGGGAGCCGGCAGATGTCCAGTCGCGGCGCCTCAAGGCCATC 480 

ACAT6TTCACTCTGTGGCCAGAAGGCGTCGCTCCTTGGGTGGCAGGATATATTGTGATGT 540 

AAACAGATTAGATATGGACATGCGAAGTCGTTTTAACGCATGCTTTATCGAATATAAAAT 600 

GTAGATGGGCTAATGTGGTTTTACGTCATGTGAATAAAAGTTCAGCATTCGTTTAATAAT 660 

ATTTCAATATCGGTGTCTAGAGACCCGTGGATTTGTATAGTCAGCACCATGATATGAATC 720 

TATAAAATATTGTATCTCCAATTGCAATTCAATCGATATAAGAAATTAATACAAGCCGTT 780 

CATATAGTAAGGTTGCCAATGGCATTCAATAACGACCGTACAGTTGCCGCTATATTAATC 840 

TACGTGCCATTTCTTAAATAAAGATAGGCGAATGACTATCGAAAATAAAACAATTATTAA 900 

TGAGTGAAAACGTATTGCAC AAATAAAGATTCATTATGGTTGGCTCAAATTTTGGCTCTG 960 

GTGCTCGATGACGTCGAGATGAGGACAGTAGTGATCAACTTGGCGGTCGATACCTTGGTT 1020 

ACGCCACTCCCAGAGTGCCATGTCGTCCTCCGAGCGGTCTGAGATAACCCAGTCGGCAAT 1080 

TGCTGCTGCATTGCCGGGCGTTCCCCAACCACGACGAATATGCTTTCGTTCATCTAACTC 1140 

GCGTCGCACTGCCCTCCCAGTCATGAAGTCAAAGCCAAATTCTACCCTCTCTCCATTTCC 1200 

CAGCTCAGTCGAGAAATCGTAACACCTCGTGGCAGCTGACAGTTTCAGAAAGGGGCGTAT 1260 

CCCTCGAACTCCAGGGTCCTCTTTCACATAGTTAGCAAGGCGTACTGCTGCATAATCTGC 1320 

GTTGAAGGCTCTGATGACTACAGGATCCTCGGACAAGCCCAATTGATCAGGGCGAACCCT 1380 

CGCGCTCATAATATGAATTGCGACGACCCTTGCTTCCTGTCGGAGCATCGAATCAATCCA 1440 

AGCCTTCCCTGCGGCATAGAGGTCATCGACTGCGATGTCA7CAAGATCGAGTAGCTTTGC 1500 

CAACCTAGGAAGTTCl TGAGGAAAAATCACCGGCATGACAGCAACCGTCTCTCGCCAGTC 1 560 
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AGTTGCCGGACTGGCTTCCCTAACGCCATCCACGAAT6CCTCACCGCTTGCGTATTTGAA 1620 
TGTGTAAAAGAGAAGGACCACTCTTTGGCGGTACTTCGGACGCCGGCTTAGCCACGCGGC 1680 
AATAATGTGGGCCTCAAACTCACGACCATCCAAAAATATAGTCGCGCCTGGATTGACCTC 1740 
GCTGGCCTTGTCGAGAAGAGGTTCCAAAAAGGGAACGGTGTCTTTCGTAATAGTACTTAA 1800 

ATCTGTGAGTTCGCCATGCGAAACCTCTCGAACGATTATCGGCGTATCCCTGACATCAGC 1860 

* • * • . 
TGAATGAAATTCTCGGACGAGTTTGTCGGGCAAAGTGGAGACCCGCCACGTGTTGAAGTC 1920 

GTGGGAAACGATGGGCACATCGTCGCCGGTGAGTGCGGCATCGAGCTCAGAGAGGTTCCG 1980 

CCTGCCAACCTCACCGAGAGCAGCTAACAACGAAGTTTCGGTGCATTCCTGTATCCCTTT 2040 

ACCCAGATTATACATGCCCCGGTGTTCGATAACTTGAAGAGGCAGTGGCTCCTCAAGATG 2100 

TTCAAGGAGGTGGGGTACAGAGTGCCGGGCGAGGACCTCATCCACCGTGACACCAACCGG 2160 

GAGATCCCATTCGAGTTTCCACTGGGGCCAGCATGTGCCCGCGACGGCGAAAGGTTTGCG 2220 

CTGGCAAAGAACCCGGCTGCTGCAGGTGGACCTATCCTTACCCATGGCAATGGGGTTTTG 2280 

CTAAAAAGTCAGGCACTTTACTGGGCAATTGATAGGGTGGGATTGCGTTATTAACTGTTC 2340- 

TCCAGCGGGAATCTTTATCTTTATTGAAATGCTAAAGCACTTAGATAAAATACAGCTGTA 2400 

CCGCAATATAAAATAGTAGGATAATGTAATATGTGTATCGAGAATACGACAAGCTAATAT 2460 : 

AATCTAGCGTCAAATTGCAATAATTTAAATCAAAACTACTGATGAAATAATAAAAGATGG 2520 

TCAATTTTTATTGGTAGGAGTTGTCGAAAGATTCGACGGACGGCCATTACAATACATAGG 2580 

TGCAAGAAGTAAAACAGGAAGGGAAACGGAAAACAGTGCTATAAAAAAGCGACAGATCGC 2640 

GGCGATCACTGACTGCGATCGGGAAGAAGCTCGCCAAGTTCACCGAGAATAGCAGAGAGC 2700 

GCATCCTCATCGGGTACTACGAACACATTCGTCCCAGAGGGCTTTGTTTCAGCTGCGCCA 2760 

ACCCAGAAAGCAAGGCCATTTTCCAAGTTGCCGATGGCGGTCAGCATGTTTTGATTGTTG 2820 

* • • • 

CTGCCGTTTCCACAAGCGATGTGAAGGCCGATCCCGTGAGAGAGGCCCTTGACGAAGGTG 2880 

AAATAGCCT7TGGATTTTCCAACTGTTTCAACGGGCACTA6ATATTGACCCTCTGGCGCG 2940 
•••••• 

GCAACCACCTTGAATTTGCGAGATGACTGGTTGCCGATGAGCGAAGAAAGCA7TTCTCCG 3000 

GCTTCTTTGTAAGATTTGTGAGATTCCCACATTTGACAGCCGTAGAAATGCCCCATCGGA 3060 

ATGTTGCGGATTCCCGGGATGCCACCAAA77TG7TCTCCATAGCCGCGTGAACGGC1TGC 3120 
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CAGTTGG6CAGGGAGAAAGAATCGAAGCGATCATCTTT6TAGATCGT6ACCATTCCATCA 3180 

TtTCCCTGGAATCCGATATTTTCAATGGCGCTGAAAACTGACCTTGCGATTTCTTCGCAT 3240 

TCCCGTGCGGATGTGAGCAATTGATAATGGCCCTTGCAGGCGATCCTGGTCAAATTGGCG 3300 

ATGATGTTGATGGCAGGATTAATATCCCAACACTGGTGATTTCGATCTTGCTTAAAGGTG 3360 

GTACCATCGCCGTCGAAGGCGAGCAGGGCCCGGAGAGATGAATCGGCAAGACTGCGTCGG 3420 

ACCCGCTCC6CGGCGTCGGGAATGAGGCTGATAAGAGACATATCCAAAGGTGTTTGTGGG 3480~ 

TAACGGGCTGCTCAATGAAGCCTTAAATGCAACGCAACATATGTAAGGATGAGTTGACTT 3540 

ATTGGAGAGAGAAATAGGAATGAGCTGGCCAGCCATTATCAACGTGGGGCCATGCTGACA 3600 

ATGTTTACGTGAAAGGCTCAACTACCTCGAAGCAGACCTCTATATTCGTTGACTTTATTA 3660 

CTGAACAAGAAGTTGCTTGCCACTCATTTTCTTAAATCTTGCCCTTTCTGCGCCTCGCTA 3720 

TCATGCCCGCCAACGACGCGACATGCGCTGCCGCGATTGCCTTCCCCGAGGGCAACTGGA 3780 

AGGAAGAACTTGATGCGCTCCGCACCTTGTGTGACCCCGTCGAGGTGGTTAAGGTCGCAG 3840 

TCGGCAGAGGTCTTAGCGGCATATGTAATGTTGTTGCAGCAATGAATCCCACAAAGGTGA 3900 

GGGGCCTCGGCGATGTCATCGGGCAGATGCCGGCTCTTAATCACCGTATTGCTGCCGCCG 3960 

CCGGCGAAACTCCGGTGCGAGACCTTGGAATAGGTTACCAGTGCGCAATCTGCCACCCCG 4020 

ACATAGCCAGTGCGATGTTAGCCACTTCTGAGGGGATCAGCCACGTTCTCCGTGAAAGGA 4080 

TTGAGAAAGAAGTTGACCGGGACATTGGAGAAGGCGCCACCGTCTGCATTTTCGTTCAGC 4140 

CGAGAATGAGCTCCAAGGGCTCTCCAGTTTCTGTCCATTTCACCCTCCAGTTTGCGAGAT 4200 

CTGGAACTCTTGTCGATGCCAGAATGATGGAGAGTTACAATTTCATGAAAGGCAATGGCA 4260 

CAGTGACCGCACCGGATTTGAAAAGTCATTGGAAGAAGCACGGTATTGACAGGCCAGGCC 4320 

CACGTCCGCCCACGTCCAAGTTTGAACTCCTCTTCGCCGCTGTCCCCGACAACAGTAAAC 4380 

TTGCCGCCACCGATTTTACCCATCTCGGCCCTGTCGAGCGTGATAAGGAACTACTCGGCA 4440 

GCACGGTATTCGGGATTGCCGCTAAGAAACCTGGTACGATCGTTTATCCGTGCGAAAAGG 4500 

TTCTCTGTTTGGAGGTCGACGTACACGCGCATCGCGCCCTAGAAGTACTTCACCGCCTTG 4560 

GGGAACAGGCTTATAGCAATGGCCGTGGCACTAGCTTCGGTCTTCACACCGGTCCGTCCT 4620 

CTTGCCTTAATCTTTCCGCCGCCGCGCTCGCTACATTTTTCAAACGCTCGGA1CTCTGTT 4680 
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CCCTTCCATTGAGTGATGCTTTTGTCCTTTTCTGCGACCCGCCACCGCCTACAGCGCCAA 


4740 


• * • • * # 

GAAAGATGGCCTTCCGATCACTGCCTTCTCCCCCACGAGCACCAATCAGTTCGAACTCGT 


4800 


• * • • • • 

AGAGCCTCAGGTCGTCAAGGCATATGTTCTCGGACTTTTCGACGCGCCGACGATGGTTAC 


4860 


* * • • • • 

GCCCCGCGACAAAACGCGAGCCAGCTTCTGCAGCCAATATGTACGTTTCCGTGAACCGCA 


4920 


• * * • " • 

TCCCTGTGAAGAGTTCAATGAAATTGGAGTTTTGATCCTCGATGCTGCTGCTAAAATGCT 


4980 


* * • • • » 

CGAACGTTATGCAAAATTTCTAGAAGATGGTGGAAGAGATGATGATGAAATGGCGAACAT 


5040 


* * • • • » 

AATAGATGTATTTGGGTTTTGTCTTAACTAGTGGATTGATTGAAACAAAGGAGTCCGAGT 


5100 


* * * • • « 

TGGGATTCCCTTTCGGTCTTCGTCGTGCAACGATATCGTATGCGTACAGGTATCACATTT 


5160 


• • • • • 

AACGTTGCTGCGGCGGACCGAGCCCGCTTGGAAGCGATTGTTGCAGCTCCAACTTCTGCT 


5220 


* * * • * • 

CAGAAGCACGTGTGGCGAGCGAAGATCATCTTGATGAGCAGTGATGGCTCGGGAACGGTC 


5280 


* * * • a 

GCGATCATGGAGGCAACCGGTAAATCCAAAACCTGTGTCTGGCGCTGGCAGGAGCGCTTC 


5340 


* * • • • 

ATGACTGAGGGCGTCGATGGCCTTTTGCACGACAAGAGCAGACCGCCCGGCATTGCGCCG 


5400 


* • • • • 

CTTGATGGCGAACTCGTTGAGCGIGICGTCGCACTGACGCTTGAGACGCCT-CAAGAGGAA 


5460 


* * • • • 

GCAACGCACTGGACTGTTCGTGCGATGGCCAAGGCCGTTGGGATTGCAGCCTCTTCGGTT 


5520 


• * • • • 

GTGAAGATCTGGCACGAGCATGGTCTTGCGCCGCATCGCTGGCGCTCTTTCAAACTGTCG 


5580 


* * * • • • 

AACGACAAGGCCTTTGCCGAGAAGCTTCACGACGTCGTTGGCCTCTACGTCTCGCCACCG 


5640 


• • • • • 

GCCCATGCCATTGTCCTGTCCGTCGATGAGAAGAGCCAGATCCAGGCACTCGATCGGACG 


5700 


* • • • • 

CAACCGGGACTCCCCTTGAAGAAAGGGCGCGCCGGCACAATGACCCACGATTACAAGCGC 


5760 


* • • • • 

CACGGCACCACCACCCTATTTGCCGCCCTCAACATCCTCGACGGCTCGGTGATCGGCCGA 


5820 


* • • • • 

AACATGCAGCGTCACCGGCATCAGGAGTTCATCCGTTTTCTCAACGCCATCGAGGCGGAA 


5880 


* • • • • 

CTGCCAAAGGACAAGGCCGTCCACGTCATTCTCGACAATTACGCGACCCATAAGCAGCCG 


5940 


AAGGTCCGCGCCTGGCTGGCAAGGCATCCGCGCTGGACCTTCCACTTCGTCCCAACATCA 


6000 


* * • • • 

TGTTCATGGCTGAACGCCGTCGAGGGATTCTTCGCTAAATTGACACGTCGACGTCTGAAG 


6060 


CACGGTGTCTTTCATTCCGTCGTTGACCTCCAGGCCACCATCAACCGCT7CGTCAGAGAG 


6120 


* • • • • 

CATAATCAGGAACCAAAGCCGTTCATCTGGAGAGCAGATCCAGACGAGATCATTGCAGCC 


6180 


GTCAAACGTGGGCACCAAGCGTTGGAATCAATCCACTAGCGTATGAACAGTAATAAGAAA 


6240 
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ATCCCGATTGTGAATAGTCCCAATTTCAW6T6TCCGTGT6TAATTTGCGTGTCTTCAG 6300 

TT6AATTTCCTTTAATAATATCAAATATTCAATTGT6AAAAGTTGTATTGGTTCAGGTTC 6360 

AAGCTTTCCGAATTTGTTGAATTTTATTCCCTGTTTTCAATTTGTTGACTTGTTTGGGA6 6420 

ACACCTTTTTTGTGTTTCGTGAACATGTCACCCCTTCGGTATACATTAGCCTACAAAGTA 6480 

AATAACGTTGATAAATGTCACTCATGTTGTAATAAAATTGAGCTTATTATGTATAACCAG 6540 

ACCCTGTGTTAATCTAATTACAAAGAAATTCATCATTCTCCCAAGCAATCCTGAGTAGCT 6600 

GCGTGATGGATCTTCCATATCAGCGCCCACGTTTCACCCCGTTTGCCGTCACCCATCCAC 6660 

GTAGTGGAGTCAACCTGAACCGTGCAATTTCTCAGGCCTTTGTCTGCTATGATCAGTTCT 6720 

GCGAACGGCTCTTGCGATATCAGCAAAGCTGGACGGATTGGGTGTTCGACCACGGATTTG 6780 

CAGAAGCCATTGAAGACGTGGCGCTGGTGTTCCAGGTTGCACCTTGCCTTCATGGCCCCC 6840 

GAATAGGCGCGCTCGAAGTGTTGATACCTCGTCGCACCCAGGTCTTCATTTATATGTCGA 6900 

ACAACCAATTGCAGCGCTTTGTTGCACACCAGTGCATTGCTCAACTTGGCGACGCCGTGC 6960 

TTGCTTGCATGATCCCGCCCTACGCGAGTGACCTCTCGCTGCAGGAAATGGCTCGGGCGC 7020 

ACAACAGATTTTGCCCAGGCAGTTACACGAGGTCCGCAGACGTACAGTGCTTTATCGCCA 7080 

TCCAACTCAGCAGCCGATTCGTTGAGGAGGGCACATGTAACGTGCACGGGCGAAATGGCT 7140 

T AAAAAGAACCTGCCGCTTCTTTCGTCGCCCTGCTGAGTTCTTCAGCCGTTATGACATCG 7 200 

TTGCCATTGGGCCGGTGCTCTTCCATGATGAACTGGATTGCCCAGCAAACTGCAATGAGC 7260 

CTCTTTCCTGCTTTGACCTGCGGTACGACTATCAGGTTTTCCTCCAGGAGTGCGATGCCC 7320 

O ATGATGGTGTGGGGCATTATCCGGAAGGCGCACCACTACCTAGTGTTGCCATCGTAGGAG 7380 

GCGGGCTGTCTGGCCTTGTTGCTGCCACAGAACTACTTGGCGCTGGCGTCAAGGAAATCA ^ 7440 

CTCTTTTCGATACCGTTGATGAGATCCGTAGTTTTGGGGCATCGCCGATGCCAAACGGCG 7500 

ACGCTCACCAGGCCTTGACGTCGTTCGGTGTCATGCCTTTCTCCGCCAACCAACTTTGCC 7560 

TGTCATACTATCTGGATAAGTTTAGAATTCCGTCCAGCCTTCGTTTTCCTTGTGCCGGCA 7620 

ACGACCACACAGCACTATATTTCCGCCAGAAACGCTACGCATGGCACGCGGGGCAAGCTC 7680 

CGCCGGGGATATTTCAGCGGGTACATGTCGGATGGAAGACACTACTCTACCAAGGGTGTG 7740 

AACGGAATGGCAGGAGACTGATGGCTCCGATGGATA7CTCTTTCATGTTGAAAGAGCGTC 7800 
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GTC6T6ATGAAGCCTCAGAAGCACGGCAGCTTTGGCTCCGAGAGTTCGGAAAATTCACTT 

TCCATGCCGTTTTGGTCGAGATCTTCAGCTGTGGTAATTCGAGTCCTGGTGGCAAGGCAT 

GGCAAACACCCCATGATTTCGAGGCTTTCGGGATACTGAGGTTGGGATACGGCCGAGTTT 

CGTCCTATTACAACGTGTTGTJTTCAACGATCCTGGACTGGATTATCAATGGCTACGAGG 

AGGACCAGCATCTTTCTATTGGTGGGGTTCAACTTTTGCAGGCTCTGATGCGCATTGAAA 

TATTCCAGAAAAGCCATGCGAAAGCACGACTCTGTTTTGATCCCGTGCGTGGAATAGCCA 

AGGAGGGCGGGAGATTGAAGGTATGCTTGAAACACGGTCATTCGCGTGTTTTTGACCAGG 

TCATCATTGGCGGCAGTGCTGAGGCCGCTACAGTTGATAACAGACTGGCCGGGGATGAGA 

CTTCCTTCAGCTACAATATCGAACCCGCCGTCGGAAACTCGTCTGCCGCTGTCAATTCAG 

CACTCTTCATGGTCACGAAGCAAAAGTTTTGGGTTAACTCCGGCATCCCAGCAGTGATAT 

GGACCGATGGGCTTGTCCGTGAGCTGTGTTGCATTGACATCGAATCGCCAGCTGGAGAGG 

GCCTTGTCGTTTTTCACTATGCTTTGGATGACTATCTATCCCGGCCGATCGAGCATCATG 

ACAAGAAGGGACGGTGCTTGGAATTGGTCAGGGAGCTTGCTGCTGCCTTTCCTGAACTGG 

CTTGTCACCTGGTCCCAGTCAACGAAGACTACGAACGATATGTCTTCGACGACCACCTAA 

CGGATGGTTTTAAGGGAGCTTTGTGGAGGGAAAATTCTCTGGAAAAAGGTCAGTATATCC 

* * • • 

AGGATCTGCCTGGGAATAATTTTCCTATTGGGGATCACGGGGGAGCCTATCTGATTGACC 
GTGACGACTGCGTCACCGGAGCCTCGTTCGAGGAGCAGGTGAAGGCGGGCATCAAAGCGG 
CCTGCGCCGTCATCCGCAGCACCGGCGGGACGCTCTCTTCACTCCAACCGGTGGACTGGA 
ATAAAAAATAGAAATTTCCTGATTAAGTTATAGTCAATGTACTATTGCGTGTTAATCCCG 
TAGGTATGCAAGCTGCACCGGCAGCATCATAATTTGATGTTCCATCAATAAATTAAGGTG 
CCCGTTCATTGTGTATTACATTATGTATGTTTATCAAAAATATAATCGAAGTCCATTTTA 
AGTCTGATATTAATTGGAATTCCAAACGATTCCTTGATGCCTATCTTCGCTATGATTGTA 
TGGTAATAAAGTCTCCACATCTCCCGAAAAATGCTTTCGTGATTTACTTGTCTCTCACGT 
GCTTTCGCATCTTGACAGCCAAAAGTGGGCAACTTGAGAAGAGTATTAACTGGCCACGCA 
ACTCGAGATATTCCCACTAACCCCAATGACGTCATTGCACTCGTCACGGGTAGCAGCCCC 
ACTTGCCTTTGCCACT7TATTAATTCT7TGGCCCACTGGCCATTAATTGGCACC1ACATA 
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TATTAGTGGAGAAGATMAGTGTCACTATCGTTTCCTGTTCAATTTTGAATTTTGCAAGG 9420 

ATTTCATGTTGTCAACTACACAGCTTGAAAGGAAATCCGCAATCAACGGAGAAACGTCAA 9480 

CATCTCGACAAAAAAAGAATGCTTCATCATTGCGTAGACTGCATATTGACCGCTCCTTTC 9540 
• • • • • . 

6GCGCTGGGCCTGCTTTTACTGTT6CCTAGC67TCGGACA6CCACCA6A6AAfGGGCTAT 9600 

ATAGATCCTTTCATCAAACCAAAACATTACTAAGATCATGCTGTAACGCTTCAATACGGT 9660 

GAGTGTGGTTGTAGGTTCAATTATTACTATTTTTGAAGCTGTGTATTTCCCTTTTTCTAA 9720 

TATGCACCTATTTCATGTTTCAGAATGGAATTAGCCGGACTAAACGTCGCCGGCATGGCC 9780 

CAGACCTTCGGAGTATTATCGCTCGTCTGTTCTAAGCTTGTTAGGCGTGCAAAGGCCAAG 9840 

AGGAAGGCCAAACGGGTATCCCCGGGCGAACGCGACCATCTTGCTGAGCCAGCCAATCTG 9900 

AGCACCACTCCTTTGGCCATGACTTCCCAAGCCCGACCGGGACGTTCAACGACCCGCGAG 9960 

TTGCTGCGAAGGGACCCTTTGTCGCCGGACGTGAAAATTCAGACCTACGGGATTAATACG 10020 

CATTTCGAAACAAACCTACGGGATTAATACGCACGTGGCTGGCGGTCTTCGATTCATTTC 10080 

CACGCCGGAGATGATATCGAATATGTTCTGTTAAGTTAAAATAAGCTGCGAGCCATGGCG 10140 

CGATTGTCCTGTTTTATTAATATAGTACTTTAACGTCTCTTTAGAGCGTTTGTGTAATGT 10200 

CGTGAAAATGTTTTATGTCAAATGTACTGTTGAACTATAATATTATAAGTCCAGGTGTGT 10260 

CGTTGTTGTTGATACTGCAATATATGTGTAGTAGATTAGATAGTCATATGAGCATGTGCT 10320 

GTTTTTGGCAAAATTCAGCAGCAGGATCAACACAGAAGAAAATATTTAGTACAAGAAAAT 10380 

AGGTCAACACATTACAACGTACGCTACAACTCCCAAGGTTCTGTGTCACAGACTGCGGGA 10440 

GGGTACATAGAACTTATGACAAACTCATAGATAAAGGTTGCCTGCAGGGGGAGTTCAAGT 10500 

CGGCTTTAGGCTTCTTTCTTCAGGTTTACTGCAGCAGGCTTCATGACGCCCTCCTCGCCT 10560 

TCCTGATCAGGCCCCGAGAGTCGCAGGGTTAGGTCTGGCTCCGGTGAGGAGGCGGCCGGA 10620 

CGTGATAtCCCGAGGGCATTTTTGGTGAATTGTGTGGTGCCGCAAGCTACAACATCATAG 10680 

GGGCGGTTTTCAGTCCCTCGCCGCAGAAAGAAGGTGCAAGCTACCTCTCTCCCGTAAACG 10740 

TTGGTCACTTTTAACTCCAGCAAGTGAATGAACAAGGAACTTGCGAAAATGGCGATGAAG 10800 

CATTCTAAATCAGGTTCCTCCGTGCGGCTGTGCGGCCAAGCAAGGTTGTGAACACGGAGC 10860 

ATCTCCTGGAGGGCGAGCTCGCTCCGATATGGTTGAATCGTTGTCGCCAGCACGGCCTCC 10920 
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ATTCCAAATGTAATGGATTGTTCCTTCAGCACTTTCTGCATCTTCTCGCGAGAAAGATAG 
ACAAATACATGTTGGTCGTTTTCTCGAGCCAGATCCGGCTGACTAACAAACATAGGAGGA 
TGATAGCAGACTTtGTTCTTCAAGAGCTCAGCTAGTTGTTTAAGTATATATATCGGTGGA 
GAGTTTTCCTTCAAATCTAGCACTGCAAGAGCCCATCGTTTCTGGAAATGCAGGAGGGGT 
TTGCTATAGTCACGGCTATAGATTGCAAAAGCAAATCGGATCCCCTCGAATAGGTTTATC 
TGGCTCCATGCTGGAGTGAGATCTACTGGTTGAAATCGTGGAAGGAATAGCAATTTGGGA 
TCCATTGTGATGTGAGTTGGATAGTTACGAAAAAGGCAAGTGCCAGGGCCATTTAAAATA 
CGGCGTCGGAAACTGGCGCCAATCAGACACAGTCTCTGGTCGGGAAAGCCAGAGGTAGTT 
TGGCAACAATCACATCAAGATCGATGCGCAAGACACGGGAGGCCTTAAAATCTGGATCAA 
GCGAAAATACTGCATGCGTGATCGTTCATGGGTTCATAGTACTGGGTTTGCTTTTTCTTG 
TCGTGTTGTTTGGCCTTAGCGAAAGGATGTCAAAAAAGGATGCCCATAATTGGGAGGAGT 
GGGGTAAAGCTTAAAGTTGGCCCGCTATTGGATTTCGCGAAAGCGGCATTGGCAAACGTG 



AATCATATTAATTAGTAACATTGTATATGTAATATAGTGCGGAAATTATCTATGCCAAAA 
TGATGTATTAATAATAGCAATAATAATATGTGTTAATCTTTTTCAATCGGGAATACGTTT 
AAGCGATTATCGTGTTGAATAAATTATTCCAAAAGGAAATACATGGTTTTGGAGAACCTG 
CTATAGATATATGCCAAATTTACACTAGTTTAGTGGGTGCAAAACTATTATCTCTGTTTC 
TGAGTTTAATAAAAAATAAATAAGCAGGGCGAATAGCAGTTAGCCTAAGAAGGAATGGTG 
GCCATGTACGTGCTTTTAAGAGACCCTATAATAAATTGCCAGCTGTGTTGCTTTGGTGCC 
GACAGGCCTAACGTGGGGTTTAGCTTGACAAAGTAGCGCCTTTCCGCAGCATAAATAAAG 
GTAGGCGGGTGCGTCCCATTATTMAGGAAAAAGCAAAAGCTGAGATTCCATAGACCACA 
AACCACCATTATTGGAGGACAGAACCTATTCCCTCACGTGGGTCGCTAGCTTTAAACCTA 
ATAAGTAAAAACAATTAAAAGCAGGCAGGTGTCCCTTCTATATTCGCACAACGAGGCGAC 
GTGGAGCATCGACAGCCGCATCCATTAATTAATAAATTTGTGGACCTATACCTAACTCAA 
ATATTTTTATTATTTGCTCCAATACGCTAAGAGCTCTGGATTATAAATAGTTTGGATGCT 
TCGAGTTATGGGTACAAGCAACCTG1 TTCCTACTTTGTTAACATGGCTGAAGACGACCTG 



10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 



BNSOOCtD: <£P_0»«6*>A2JL> 



FIGURE 2, Sheet 9 



10 



/ 16 0204590 



TGTTCTCTCTTTTTCAAGCTCAAAGTGGAGGATGTGACAAGCAGCGATGAGCTAGCTAGA 12540 

CACATGAAGAACGCCTCAAATGAGCGTAAACCCTTGATCGAGCCGGGTGAGAATCAATCG 12600 

ATGGATATTGACGAAGAAGGAGGGTCGGTGGGCCACGGGCTGCTGTACCTCTACGTCGAC 12660 

• • • • • • 

TGCCCGACGATGATGCTCTGCTTCTATGGAGGGTCCTT6CCTTACAATTGGAT6CAAGGC 12720 

GCACTCCTCACCAACCTTCCCCCGTACCAGCATGATGTGACTCTCGATGAGGTCAATAGA 12780 

GGGCTCAGGCAAGCATCAGGTTTTTTCGGTTACGCGGATCCTATGCGGAGCGCCTACTTC 12840 

GCTGCATTTTCTTTCCCTGGGCGTGTCATCAAGCTGAATGAGCAGATGGAGCTAACTTCG 12900 

ACAAAGGGAAAGTGTCTGACATTCGACCTCTATGCCAGCACCCAGCTTAGGTTCGAACCT 12960 

GGTGAGTTGGTGAGGCATGGCGAGTGCAAGTTTGCAATCGGCTAATGGTTAGTCGATGGG 13020 

CTGACGAGTTTGATGTCAGGAGAAGCTGAGTGTGTCACTTGTTTCCCTTTAAGAAGTATT 13080 

AATGTAATAAAAATCAAGATCTGGTTTAATAACTGGATACTTGATTTCATCGCGCTTTTT 13140 

TTGAATAAATGTTTGTTGTCTTGACTTTAAGATATCCTTTGAAATTTGCGTTATTCGTAT 13200 

TTCGCTTTTGGTTATTTCCAAAAGACTTTGCTCAGTAAGATCAAACGTTTGTATTTCTCC 13260 

GGGCCACAATATTTGACCTATATGCACTGGCCCACGCGCCGCAATAGATGAAAATTGCCA 13320 

AAATTAGCTATCGGTCTTCTGAAAAGAAGGGCCGACATGTTTTCATAGACCATGCAAAGT 13380 

CATACTACCTGAAACTGATAAATAACGACAAAGAAAGTAGCCTATTTAAAAGTCGCTATA 13440 

GCATGAATTCAACACAAGGAAACCAAAAGTCGGAAGGAAGACTTTAATCCCGGATTATTT 13500 

GGACATGATAGGAGCTATGGGGCAACGTGTCATTTTCATGAGTGTTGAATGATTTTCTGT 13560 

AGCAAATAGAAAACGTTTTTTAAAACGATGTGGCCTTGGAGTAATCAGCGGAAGAAATGG 13620 

TCATGCTCAGATAATTTCCGTTGCTGACCTCGCAACCAACCCCTTTAAATACCTCTGCTG 13680 

CCCATGCATTTTGCCAAGTTAACCTAAAGTGGCAGCTGAATGGCTCGTTATTGCAGTGGT 13740 

GGCTCTCAACGGCTTCATGTCGATGATTTTCGTTGGATCAAGGAGCCCACTCGACTGAAG 13800 

GCTCAGCTTATTAATGTGGTGGAGACCTACAAGGCTGCACAAACAGAGACGTTAAAGTAC 13860 

TATATATCATCTGCAACTGAGCGTGTGGCTCATGTGGAGGCAGCCGAGGTCAACAATGCG 13920 

GAAATGGAGCTGCATCCTGCTGGGTTGAAGTACCCTCTGTCCTTCGTCTTTACCTCCCTG 13980 

GCCGTGGCTACAGCCTGCAAGGAGAACAAGCATCTCTTGTGCGAGGAGCAT1 TGGAGGGG 14040 
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6ACTTGATATCGTGCGTCGTTCCTCCCTATCAGACAAATGTCTCACTCGCTGCTTTAAGG 
GAGCTCCACAATTCCATTTCGGGAGGAGGGTACCAGGAACAAGCAGACATGGATTATTTT 
GTGGCGATCATCCtAAATGATAATTTCGACTATCAGAGCTGCGAAATCGACACACGAAGT 
TGCGGTAAAGGACTTTGCAAGATTTATAGTAGGGAACTGGGAGGGCAGCCTCTAGCTTAT 
GACGCCATACTGGCAATCGGCAAGGTGCTGCTGCTGGAATAGATAGTGGGCCGCTGATCC 
GAGTTTGATTTTGTCGTATTATGTTACGTGAACTTTTTATCATGCATGTTTCGCTTATGC 
TCCCGAGTGTCGGCCATGTTGTTGTGTTAAAATAAAAGGCTGATGTTAAGTCCTATTGTA 
AAATACCTTTATAGATTAAATATATATAGTATAACTTCTGTATGCCGTCGATGAGCGGTT 
ATATGATTGTAATCTATACGTTGTTGCAATCAATCGTATTACAGTGAGCCGTGCTTAATG 
AAATAAACATCATGTTAAATGTCTATTTATTCAATCAACATGCGCTGACAATAATCAAAA 
GGGGAAACGTAATAACATTGCGGTGGATACAGCGTTTATTGGGAGGTCCGCGGGCCGATA 
CACTTAAATAACATAGACAGAATTTGAGAGAGCACGCAGGTTGTAGCCAAGTTGAGCGAC 
TTGCCGGTAGCACGGAAGCTAAGCTCAGGTGTTACAAATAGACAGGCGTCGAGGCGACGA 
GCACGACGACCTTGCCGGACATTGCGGTCGCAGGGGGCTCAAAGCGGTTGGCTTGTAACG 
GACCTTGTGTTTCTTGTTGTAGCTTTCATCGAGCATAACCATTGGGACGGTTGCTGAACA 
ACGGTAACGCACTTTTTTCACGGGAGCGAGGTAGAAGAACATATTTCCCCGTCGGCAGCC 
GGCGGTGAGCATGCCMTTCCTAAGGGATCAATGGACTCGTGCGAACGGTGAGCATGCCG 
TTCTGACCGTCGGTGCCCAATCAGCAGGCCACTCCCAACATGTTTTCCAAGTCCTTAAAA 
CCAGTCTTTATAGCATTGATCTCCCAGCAATCTTTATTGAAGTCGATTTTAATATTCAAA 
AGAAGATTTTAGTGGAAAGGGAATATAATCGCGTGGCCGAAGAAGAGCCTTCAAAAATCA 
GAATCCACTAGGATAAACAATAATATCTGAAAAGCATTGAATTTGGGTTAGGCACGAGAG 
GCTGACGCGGATGCCACTCGATTGCTAGTGGAAGGATTCCCTTTTTTCTAGCGTATCGAA 
TTCACCGTTTCACTATATGTTTTCCTGATTGGTTGATCTGCGGGACCACCATTGACTGCC 
ACTAATATCGAAAGTGGGTCTGCTTCGATTATGATGCT.TTGTGAGAGGTTCTCTTCCCAA 
TGCATGCAAGCTGGCAGATTCGGATACTCTCAATAGAGATCTTATTTCGCGTCTCAAAAA 
G7TCCCAGAAATCAACAAAGGGGAGGGCAGGTCCTTTAAATACG1TGCAGCTGTCCTTTA 



0204590 

14100 

14160 

14220 

14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 

14820 " 

14880 ~* 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360 

15420 

15480 

15540 

15600 



( 



FIGURE 2, Sheet 11 • j l6 0204590 

AAATAGAAGAGAATTTACAGCTGGAGGCACAGACCACTAAACTGCGAAAGTAAGCATGGC 15660 

AGATGAGTTGGAGCGTCAATTGGAAGCCATTTCTCTCATTACAGTCCTGGGTCCGGATGT 15720 

GAAGGCTGAGCTTGAGGCGGAGCTACGAGACTACTGCGAAGATCTCGA'CTTCTGGAAAAG 15780 

CCACGGTTTACCGGTGGCGGATCTCGATCAGACTGTGACTGTCGACAARrTTrTATAftAT i*R/in 

GTATATGGATCGGGCAACAGCAGACCTGTGTGTGAAGAATCGCTGCCTCGTTTGCAACAG 15900 

TGGCAATTCAGCCGCAAAAGTAACCTCGCTTCCACCATACCTTGCAGGCGTGACAAGCGC 15960 

CGAGGCCTATGAGAAACTCAACTCCATTGTTGATGGGAGTGTCGCCCCCCAATCTCGTGG 16020 

GCCTCCCTGCTATTTTGTGGCGTTCCTGCCCAGCAGCTGTTTCGAGAAAACCAGTGAGAT 16080 

ATCGGTGCGCACAGTGGACGGCGAGTGTGGCCCCTTCGATGTCTTTACCCGGCAGCGTCA 16140 

GCCACAGGATCAGAGTGATATGTTTTTTAAATATGAAGGAGTTGTATGTGCTGGAAAGAG 16200 

TGTATTTATGTAAGAATTATCTTTTATAGCCTGTGTTACGTTTGAACCCGGTCCGCGCGG 16260 

TATTGTTTTCAATAAATGGTATGTGCGGAGGATATAATTGGTCTTTCATTGGTGTGATTT 16320 

ACGTGTAACGCGGATAATAATAAAGTAAATTACAAAAGAGAAACGCATAATTTTATTCCA 16380 

GAATGATTGCGAGAAACGATGAAAATACATGAAAATGCATATTGTCGCCAGGGAAGGATG 16440 

GCGCCGAAATAAACGAAACTGAGCCAATACAGTGACTTGCCAAGCGAGTTTGATCCTACC 16500 

AAATTCGCGCAAATTAATGCCCGTGTTCCATCGGGCCAGCGAGTTTATTCAAAAGAGTTT 1 6560 

CGTACACGTGGGCGGCGACGGCAACGTCAATGCTTGCTAGCCCTACCGGCGAGAAGTTGG 16620 

CCGGCCCCTTCCATGCCTTGAGGTCATTCATCAAGGCCTCGTCATCGAGAATTTCGGTGT 16680 

AGTTCTTGATCCCATCGCGCTTGCCGTGTTGGGTCAGTTTCATACCGCGCCTAGAATAGT 16740 

AGAGGGCAACGGCATCAACGTTGCGGGCTTCCATCGCAACAAGGTCATCGGCGACAATTA 16800 

GACCATCCGCAGATAGGACATGCTCAATGTAATCCGGCGGCATGTCATCAATACC6AGTG 16860 

ACAAAGTGACTGCGTTGGGGGCGATTTCAGCGGCTTCGAATACCGGTTTTCCGTAGTTGG 16920 

TCGCCATGATGACGAATTGAGAATATGGCAAAAGGCTACGATCGCCGACAGCTTCAAGGC 16980 

TAAAGGTTACGCAATCACGTAACTTTTCGACGAGCTCGAAATTGGATTTCTTACCGCGGC 17040 

TGAGCACTGCTACCTTACGAATTCTCTTAGCGGCACCATAGTTAAGTGAGAGAA7TACAG 17300 

CTTCGGCAACTTTTCCAGCCCCAAACAAGAAAACGTCGATGTCCTCTCTGCCTTGCAACA 17160 
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GCAGGTTTACGCATGCTAGCGAGAACCAACCCGTTCTTCCATTAGAAATTGCCACGCCCT 17220 

CTACCGACATAAGGAGCGTCCCGGACACCUGTCGCGCAGGAAAATATCGGAGTGCTGGA 17280 

GCGGCTTTCCGGTAGCGGCGTTGGTTGGCGCGAAGTGGATGTCTTTGGTGCCGGAATATC 17340 

TTCCGAAATAGCCAATGAGTGCTCCTTCAGTCCATCCAGGAACATTCTTGTTGAACGTTA 17400 

GGTAAGCTTTGACATGTCCGGCTTTTCCTGCGGCAAACACCTCCCAATAGGACTTGAGAG 17460 

CTTCGTCAACAAATGCTGGTGTGATCTGGATATCGAGGTTTGATAGTGCAGATTCAGTCC 17520 

AGTGTACCTCGCAAAGTTGTTTGGCCATCTGCCTTGTAGGTGCGAATTTTCTCTGCTCAA 17580 

ATTGTTGAGGTTAGCGGATTTGTAAACGCGTTTATATGGGCTGCTTGGAGGGTACTTTTG 17640 

GATTAATTTTTTTCTGCCAGCGCATTCTGACGCGGCACCGCTTTGGAAAGTGCGCTGTGG 17700 

GTCCGCGTTTTCTACAATAATGTGCCGATCCGGTCAGAAAGTATATGGATGAGTTGTGCC 17760 

AGCCTCACCAACGTGCTGCAGGCCCATCATGACTACTTCAATGTTAATGGGGGTAATGAA 17820 

TAAATAGGCGAAATTGGGTTCACGGTGGGCCCAGGGAATATAATATTGCCGCAGAGGTAG 17880 

TCGGATGCCAAGGCCCGCAACTAATAGTTCACGAACAAATTCATTGTAGTGGGCGGCCAA 17940. y 

CTCCAAAACCAATTGCCAGTTATTGTATTGCAATACATATATGAGTATTCGGATACAACT 18000 * 

AATTTCATTAAATAATATTTTAAGTGTGGACAGAATAGCGCCTAATAAATTTGCGAATGT 18060 

TGTCCAATTGACGTTTTTATAGGTAACTCGATAAATCGTGCTTTTGTGATATTCTGATGC 18120 , 

GGACAATATACATTTAAACATAAAGATATAAGTTATTGAGGCATTTATGTATATTACAAT 18180 - 

AGTGGGGTACATTTTTCACAGATGCTGTCACCCATGAAATATTGGCAAAATACTCTTAAA 18240 

ATATGCAAGAAACTAAAGAGGATGCATGGGTTGGGCTGTAGGTACATGGATGCAAATGcf 18300 

GTTTTGCAATAAGTCATATAGTCTCGTCTGTTGAGTGAGGCCCATTCAATCAGCAAGTAG 18360 

GACTGAGGTGCATGATCGACATATTTTTGAACCAGAGTTTTGGCAAGTTTTTCATACAAA 18420 

TGCACGGCTACGGCCAAATCGTAGCTTGCAAGTCCAACTGCTGAAAAGTTAGCCGGCCCG 18480 

TTCCAAGAAATTAGCCTTTGCATAAGGACTGGATCGCGGAGAACTTCAGAGTAGTTCCTG 18540 

ATCCCATTGTCCCTGCCGTG7TTTGTTAGCTTTAAATGGCGTCTTGAATAGTGCAGCGCC 18600 

AACGAGTCGATA7TACGTGTTTCCATCGCATCCATATCATCTGCCACCACGATGCCACTC 18660 

AGCTTCAACACGTGATCAAAATAGTCAGCTGGCAATTCGTCAATTCCAAGCGTCAATGTA 18720 
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ACGGCATTGTCTGT6ATCTCCTTCATCTCAAAGACGGGCTTGTTTGAATTCGTCGCCGTA 18780 

ATTATGAACTTGGATTTGCTGAGATATGCTCGATTGTTAACAGCCTTGAGTGAAATCTTG 18840 

ACTTCCGGCTGAAGCCTTTGCACCAACTCATGGTTTGACTGGTTGCAGCGGCTGAGAATC 18900 

• • • * • • 

rr*r ATTrrTTr ft ATTrTTrr/irfTrrTrrrrfl ATTrArrrrrftrr^TrATrrrrTrrrr ft mnfn 

ui/un i i i i unn nun i^urvun i uo i tvuunn i i unvjuuuunoun i un i uuou i tuuun 1 o?uu 

ACTTTACGTGCTCCGAATAGGAAGACATTGATCTGGCTTCGGCCCTGCAATAGGAGATTC 19020 

AGGCATGCTAGTGCCAGCCAACCAGTTCTCCTCTCCGATATAGCCACCCCATCAACAGAG 19080 

AAGAGACGTCTACCTGTGAAACGATTGCGAAGCCAACGTCGATGTGAGAAGTCGGTTCTT 19140 

TGTATCTCGCGTTTGACGGATTAGAATGGATGCTTTTCACACCCGAATAGTCGCCGACGA 19200 

AACCCACCAGAGCTCCCTCCGTACAGCCCTCTCGATCAAGTGGAACGAAGACCTTGTTGT 19260 

GGCCGAGCCGCCCTTCAGCAAAGAGGTGCCAATAATCTTTCAAGGCATCCGCGACGAGTT 19320 

CCGGTGTAATGTATATTCCAAAAGCCGATAGAGATTCCTCTGTCCAACATTGCTCGTGTA 19380 

TTTGATCGGCCATGTTTGTGTTTGATCAGCCTCCTTTCGAAAATTTCTTGAGTTTCGAAT 19440 

AATTCTAAAATCGAAGGACGATTAATAGTGCCATACCAAGACAAGAAGGGTAGGTGGGCC 19500 

ATCAATCCACAAGCCTAGCACATTTTGCTGTCTGCTCATGCAAGGTATCCAATGGAAGCC 19560 

TGGATTGGTTAGCGGAACTTGGTGGGTTCAATTGGAGCGGGCAGGTCACTTTTTGTCTCT 19620 

CAAATAACTGAAACTAAGTTTTGTTATTTGGTATGTGTTTGTCTGTTCTGCCGAAGGTGC 19680 

CCGAATTTGCGCAAATTCCTTTCTAAAAAGGCTTACATCTAGCAAAAGGTGAGCCCTGTG 19740 

CATCCCAGCATTTGGACAAAGCGCGCCAATTCGGACAGCGACTGGCTGCGTTGGAGGCTC 19800 

GGATCTCAAAGAATAGAAAAGAGTTATGATCATGTTCAGAACCGCCAATTTTGTGCGGTA 19860 

TGAGCTCTTTGATGAAAGTAATGGTTTCAAAAAAGCAACATCGTGGGTGAAAGGTACCTA 19920 

CATATCTTCACAGACAATAACTACTGTTGCTGTTTGCTGATTGACTGACAGGATATATGT 19980 

TCCTGTCATGTTTGTTCAATTGTTCAATTGTTCAATTGTTCAATTGTTCAATTGTTAATG 20040 

TATAAGTTCGTGATGAAGGATGGTTGTTTTAAAAATAGTATGTTTGACTGAGGTTAAGTC 20100 

ACTCACGTTTTGCACATCGACGGACCGTAAGCATTCTTTCGGTAAGACCGAAGCTCGTCC 20160 

CAGATAATAGGCCCCGTGGAGGGAGGCCTTGTATGGGCCGACCGATGGGCGTGCTGAGCC 20220 

GAGTACGGCGACGCCTGCGGCGATTGCGCGGGCGGCAC1GCGCGCAGGGGCACGGG1 TCA 20280 
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TACGAGGACGAGCGTAAGGGGCATAGAGCTTTCCGCCCGTCGGGTTTCAGCCATATT6CT 20340 

TGATTGCGGCCGACTGGAATGCAGCCAGGTCGTGCTCGCCGGCGGCGCCTGGTCGAGCGG 20400 

CATGC7GCGCACCTCAGTGTTCGGCTTCCTCAGCTCACGGTTACTGTGTCGGCGCTAAGA 20460 

ACCGAGGCCTTTGATGGCGGACCTGGCCTTTTCAATCAAGGGATGCTGATTTTCCATCCG 20520 

TAAGCGTCTCGACGGTGGTTACACCGTCGGCTTCGGCGCGACGATGAGAACCGAAATCGT 20580 

CAGCGATAGCTTTCGTTTTCTGTCGGATTATTTCCTCCTGATGCGAGAGGAATGGCTTTC 20640 

TATCCGGCTGCCGGCGGGTGCGCGCGTCCGGATTGACCCTCCCGTTCGTTGCTACTTGGC 20700 

TCGAGTGACGAAATAGCACGCCTGTGCCGCTGTATCATGTCCATCGGGCTCACAGGAGAT 20760 

TCGCTCGTAGCGCGTTGGTGTCACTCACCAACACGCGTCGTCGCACCAAATTGGGGAGGA 20820 

TGGTAGCGGAATCCTAAAATCCTAAAACCATACCGACGCGTCACGGCGCTCGTGACCCCT 20880 

GCGAGCGACGCGGCACTCTCTCACCTGATCCGTGCTGCGGTTGCTCAATACGCAATGAGC 20940 

ATTGTCACGGTTCTCAGGGTAAACGGCAATCTCTTCGTCATGCGGGCGTGGATGCTATCA 21000 

CCGTTAGAAAGGGCCTGCCCCCATGGTGGGTCTCTAAGGTTCAGTCTGAGAAGGGGCAGC 21060 

CAG AGCGGCACTGTTTGAAGAGCAGTCTGAACCGCTCAGATCGCTCGCATCG ATGCTTGG 2 1 1 20 
GCGGCG 21126 
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